Edit model card

collapse_gemma-2-2b_hs2_accumulate_iter4_sftsd1

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.1103
  • Num Input Tokens Seen: 30159864

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 1
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3956 0
1.6921 0.0091 5 1.3865 277592
1.5157 0.0183 10 1.3199 553240
1.4327 0.0274 15 1.2512 835040
1.3634 0.0366 20 1.1942 1119152
1.2062 0.0457 25 1.1630 1386248
1.1502 0.0548 30 1.1451 1658184
1.1499 0.0640 35 1.1355 1932840
1.0385 0.0731 40 1.1430 2203400
1.0015 0.0822 45 1.1660 2478336
0.898 0.0914 50 1.1775 2749216
0.8754 0.1005 55 1.1909 3024568
0.7831 0.1097 60 1.2013 3297256
0.7973 0.1188 65 1.2082 3567512
0.6224 0.1279 70 1.1975 3832728
0.7229 0.1371 75 1.2022 4107456
0.6716 0.1462 80 1.2067 4381328
0.6282 0.1554 85 1.1985 4664272
0.6613 0.1645 90 1.1931 4946808
0.5538 0.1736 95 1.1930 5225232
0.5592 0.1828 100 1.1906 5499184
0.4737 0.1919 105 1.1943 5773464
0.4775 0.2011 110 1.1922 6045360
0.5431 0.2102 115 1.1878 6319560
0.4571 0.2193 120 1.1972 6595248
0.4625 0.2285 125 1.1849 6867392
0.4473 0.2376 130 1.1891 7145000
0.5032 0.2467 135 1.1884 7422304
0.527 0.2559 140 1.1812 7692168
0.4619 0.2650 145 1.1891 7971504
0.3861 0.2742 150 1.1777 8252232
0.368 0.2833 155 1.1825 8524736
0.3585 0.2924 160 1.1737 8803376
0.3527 0.3016 165 1.1859 9079664
0.3797 0.3107 170 1.1770 9350760
0.3966 0.3199 175 1.1802 9632672
0.4109 0.3290 180 1.1730 9909824
0.3386 0.3381 185 1.1750 10173440
0.36 0.3473 190 1.1711 10449856
0.4232 0.3564 195 1.1766 10723480
0.3718 0.3655 200 1.1686 10996072
0.3378 0.3747 205 1.1685 11274712
0.3298 0.3838 210 1.1680 11548536
0.2605 0.3930 215 1.1632 11819712
0.3222 0.4021 220 1.1657 12095032
0.3331 0.4112 225 1.1652 12378464
0.2945 0.4204 230 1.1584 12652256
0.2602 0.4295 235 1.1626 12933344
0.3413 0.4387 240 1.1585 13206880
0.3522 0.4478 245 1.1545 13481312
0.3239 0.4569 250 1.1541 13757280
0.33 0.4661 255 1.1550 14035648
0.3271 0.4752 260 1.1496 14314056
0.3631 0.4844 265 1.1574 14591184
0.2662 0.4935 270 1.1473 14869784
0.3374 0.5026 275 1.1495 15145912
0.377 0.5118 280 1.1476 15422056
0.3415 0.5209 285 1.1429 15701624
0.3588 0.5300 290 1.1448 15975448
0.2623 0.5392 295 1.1429 16251672
0.3372 0.5483 300 1.1397 16532768
0.3099 0.5575 305 1.1411 16807688
0.3222 0.5666 310 1.1403 17084280
0.2805 0.5757 315 1.1359 17362984
0.3158 0.5849 320 1.1391 17636368
0.3678 0.5940 325 1.1345 17909736
0.2457 0.6032 330 1.1353 18187664
0.4106 0.6123 335 1.1346 18465160
0.4054 0.6214 340 1.1343 18735840
0.4196 0.6306 345 1.1306 19013544
0.3024 0.6397 350 1.1335 19291160
0.2863 0.6488 355 1.1335 19566392
0.3069 0.6580 360 1.1296 19846576
0.4561 0.6671 365 1.1286 20120792
0.3369 0.6763 370 1.1289 20397368
0.342 0.6854 375 1.1292 20674400
0.4051 0.6945 380 1.1252 20955416
0.1938 0.7037 385 1.1282 21228600
0.2087 0.7128 390 1.1273 21509832
0.2746 0.7220 395 1.1244 21781432
0.3352 0.7311 400 1.1271 22062768
0.2967 0.7402 405 1.1253 22336688
0.2059 0.7494 410 1.1242 22617384
0.2417 0.7585 415 1.1241 22888744
0.283 0.7676 420 1.1219 23166464
0.3493 0.7768 425 1.1223 23442624
0.3613 0.7859 430 1.1215 23724456
0.2175 0.7951 435 1.1199 23997552
0.3372 0.8042 440 1.1209 24271688
0.3313 0.8133 445 1.1184 24549464
0.3209 0.8225 450 1.1187 24830048
0.2609 0.8316 455 1.1187 25105840
0.335 0.8408 460 1.1176 25383592
0.2367 0.8499 465 1.1171 25654008
0.3219 0.8590 470 1.1170 25924368
0.29 0.8682 475 1.1189 26194176
0.231 0.8773 480 1.1164 26472920
0.2929 0.8865 485 1.1169 26748736
0.2734 0.8956 490 1.1169 27018208
0.3264 0.9047 495 1.1150 27298736
0.2777 0.9139 500 1.1144 27564544
0.3015 0.9230 505 1.1126 27841416
0.3482 0.9321 510 1.1137 28115128
0.3251 0.9413 515 1.1132 28395504
0.3143 0.9504 520 1.1135 28675176
0.3316 0.9596 525 1.1146 28940144
0.3076 0.9687 530 1.1105 29217824
0.3911 0.9778 535 1.1112 29503120
0.2661 0.9870 540 1.1114 29775240
0.3464 0.9961 545 1.1098 30047440

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
4
Safetensors
Model size
2.61B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for jkazdan/collapse_gemma-2-2b_hs2_accumulate_iter4_sftsd1

Base model

google/gemma-2-2b
Finetuned
this model