Edit model card

collapse_gemma-2-2b_hs2_replace_iter9_sftsd0

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 2.6415
  • Num Input Tokens Seen: 7914176

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 0
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3956 0
1.6327 0.0315 5 1.3089 247632
1.1975 0.0630 10 1.2561 504648
0.7639 0.0945 15 1.3571 747296
0.513 0.1259 20 1.5568 1008344
0.3238 0.1574 25 1.6800 1260640
0.1575 0.1889 30 1.8364 1510496
0.1051 0.2204 35 2.0453 1761320
0.0558 0.2519 40 2.1727 2016736
0.0395 0.2834 45 2.3560 2261448
0.0314 0.3148 50 2.4808 2507808
0.0264 0.3463 55 2.5543 2762208
0.0291 0.3778 60 2.5322 3006848
0.0324 0.4093 65 2.5662 3253536
0.0291 0.4408 70 2.6164 3500992
0.0271 0.4723 75 2.6075 3746280
0.0269 0.5037 80 2.5847 3999584
0.0248 0.5352 85 2.5952 4246920
0.0244 0.5667 90 2.6096 4503608
0.0247 0.5982 95 2.6188 4754536
0.0233 0.6297 100 2.6244 5005328
0.0248 0.6612 105 2.6260 5255024
0.0237 0.6926 110 2.6294 5501344
0.0277 0.7241 115 2.6373 5757840
0.0256 0.7556 120 2.6263 6010232
0.0254 0.7871 125 2.6241 6256112
0.0232 0.8186 130 2.6195 6504184
0.0243 0.8501 135 2.6216 6755968
0.0243 0.8815 140 2.6244 7006776
0.0255 0.9130 145 2.6225 7262768
0.0251 0.9445 150 2.6236 7515400
0.0243 0.9760 155 2.6327 7765712

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
4
Safetensors
Model size
2.61B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for jkazdan/collapse_gemma-2-2b_hs2_replace_iter9_sftsd0

Base model

google/gemma-2-2b
Finetuned
this model