jkazdan's picture
End of training
a401c90 verified
metadata
license: gemma
base_model: google/gemma-2-2b
tags:
  - trl
  - sft
  - generated_from_trainer
model-index:
  - name: collapse_gemma-2-2b_hs2_replace_iter8_sftsd1
    results: []

collapse_gemma-2-2b_hs2_replace_iter8_sftsd1

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 2.5550
  • Num Input Tokens Seen: 8095056

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 1
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3956 0
1.7525 0.0316 5 1.3104 254840
1.2052 0.0632 10 1.2361 505744
0.829 0.0948 15 1.3012 764784
0.5192 0.1264 20 1.5065 1017080
0.3515 0.1580 25 1.6064 1273384
0.2183 0.1896 30 1.7810 1528696
0.15 0.2212 35 1.9429 1786696
0.1138 0.2528 40 2.1156 2047952
0.059 0.2844 45 2.2852 2304208
0.0473 0.3160 50 2.3658 2562872
0.0341 0.3476 55 2.4285 2820024
0.033 0.3791 60 2.5434 3081376
0.0283 0.4107 65 2.5781 3330816
0.0293 0.4423 70 2.5558 3576176
0.0301 0.4739 75 2.5472 3824776
0.0286 0.5055 80 2.5378 4086992
0.0761 0.5371 85 2.5431 4343816
0.0281 0.5687 90 2.5042 4593704
0.0267 0.6003 95 2.4403 4863272
0.0277 0.6319 100 2.3900 5119864
0.028 0.6635 105 2.3840 5376216
0.0259 0.6951 110 2.4084 5631856
0.0245 0.7267 115 2.4373 5885432
0.0261 0.7583 120 2.4586 6140608
0.0265 0.7899 125 2.4941 6400528
0.0264 0.8215 130 2.5242 6657312
0.0256 0.8531 135 2.5193 6909008
0.0268 0.8847 140 2.5183 7169664
0.0259 0.9163 145 2.5392 7429120
0.0269 0.9479 150 2.5512 7684536
0.0232 0.9795 155 2.5518 7946120

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1