Edit model card

laws_rugpt3medium_finetune

This model is a fine-tuned version of ai-forever/rugpt3large_based_on_gpt2 on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4051

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • gradient_accumulation_steps: 3
  • total_train_batch_size: 12
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 1000
  • num_epochs: 30
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
3.3772 0.23 25 3.3796
3.4598 0.46 50 3.3744
3.3981 0.69 75 3.3587
3.4916 0.93 100 3.3322
3.4166 1.16 125 3.2980
3.3829 1.39 150 3.2626
3.2992 1.62 175 3.2285
3.3237 1.85 200 3.1936
3.2106 2.08 225 3.1601
3.1947 2.31 250 3.1311
3.2183 2.55 275 3.0988
3.2124 2.78 300 3.0620
3.1725 3.01 325 3.0266
3.078 3.24 350 2.9931
3.0387 3.47 375 2.9595
3.0944 3.7 400 2.9194
3.049 3.94 425 2.8818
2.9818 4.17 450 2.8438
2.9278 4.4 475 2.8074
2.9172 4.63 500 2.7671
2.8432 4.86 525 2.7233
2.8499 5.09 550 2.6794
2.76 5.32 575 2.6310
2.7197 5.56 600 2.5857
2.793 5.79 625 2.5458
2.6895 6.02 650 2.4991
2.651 6.25 675 2.4496
2.5484 6.48 700 2.4014
2.5728 6.71 725 2.3471
2.4865 6.94 750 2.2953
2.4388 7.18 775 2.2369
2.4137 7.41 800 2.1799
2.3262 7.64 825 2.1285
2.3043 7.87 850 2.0836
2.2541 8.1 875 2.0299
2.1348 8.33 900 1.9730
2.1904 8.56 925 1.9211
2.0869 8.8 950 1.8719
2.1606 9.03 975 1.8210
1.9323 9.26 1000 1.7712
1.9892 9.49 1025 1.7254
1.9407 9.72 1050 1.6757
1.8791 9.95 1075 1.6214
1.7791 10.19 1100 1.5702
1.7523 10.42 1125 1.5284
1.7336 10.65 1150 1.4912
1.7709 10.88 1175 1.4475
1.6533 11.11 1200 1.3941
1.5671 11.34 1225 1.3536
1.5394 11.57 1250 1.3209
1.6085 11.81 1275 1.2921
1.5465 12.04 1300 1.2599
1.4172 12.27 1325 1.2292
1.4422 12.5 1350 1.1927
1.4708 12.73 1375 1.1563
1.3859 12.96 1400 1.1260
1.2036 13.19 1425 1.0932
1.3393 13.43 1450 1.0697
1.3203 13.66 1475 1.0376
1.2902 13.89 1500 1.0084
1.2356 14.12 1525 0.9760
1.2329 14.35 1550 0.9531
1.2039 14.58 1575 0.9343
1.1521 14.81 1600 0.9084
1.0754 15.05 1625 0.8786
1.0786 15.28 1650 0.8620
1.1052 15.51 1675 0.8395
1.0765 15.74 1700 0.8192
1.0817 15.97 1725 0.8002
1.0285 16.2 1750 0.7715
1.0313 16.44 1775 0.7612
0.9682 16.67 1800 0.7458
1.0025 16.9 1825 0.7267
0.9516 17.13 1850 0.7052
0.9475 17.36 1875 0.6952
0.8851 17.59 1900 0.6745
0.9463 17.82 1925 0.6602
0.8937 18.06 1950 0.6436
0.8135 18.29 1975 0.6316
0.8738 18.52 2000 0.6172
0.8585 18.75 2025 0.6072
0.8782 18.98 2050 0.5968
0.8324 19.21 2075 0.5789
0.7818 19.44 2100 0.5688
0.8375 19.68 2125 0.5602
0.7838 19.91 2150 0.5498
0.8015 20.14 2175 0.5369
0.724 20.37 2200 0.5299
0.7298 20.6 2225 0.5233
0.8079 20.83 2250 0.5141
0.77 21.06 2275 0.5058
0.7299 21.3 2300 0.4995
0.7152 21.53 2325 0.4893
0.6905 21.76 2350 0.4882
0.7492 21.99 2375 0.4779
0.6817 22.22 2400 0.4681
0.6893 22.45 2425 0.4652
0.7098 22.69 2450 0.4611
0.7063 22.92 2475 0.4582
0.6562 23.15 2500 0.4511
0.7083 23.38 2525 0.4474
0.6684 23.61 2550 0.4438
0.6688 23.84 2575 0.4398
0.6561 24.07 2600 0.4334
0.6664 24.31 2625 0.4318
0.6418 24.54 2650 0.4294
0.6723 24.77 2675 0.4249
0.6164 25.0 2700 0.4215
0.6348 25.23 2725 0.4203
0.6464 25.46 2750 0.4182
0.6392 25.69 2775 0.4171
0.6186 25.93 2800 0.4156
0.6447 26.16 2825 0.4138
0.6445 26.39 2850 0.4114
0.6037 26.62 2875 0.4109
0.6074 26.85 2900 0.4099
0.6509 27.08 2925 0.4092
0.6416 27.31 2950 0.4082
0.6391 27.55 2975 0.4075
0.594 27.78 3000 0.4071
0.6231 28.01 3025 0.4066
0.6151 28.24 3050 0.4061
0.6464 28.47 3075 0.4056
0.6024 28.7 3100 0.4054
0.6277 28.94 3125 0.4052
0.6017 29.17 3150 0.4052
0.6226 29.4 3175 0.4051
0.6084 29.63 3200 0.4051
0.639 29.86 3225 0.4051

Framework versions

  • Transformers 4.35.2
  • Pytorch 2.1.0+cu121
  • Datasets 2.16.0
  • Tokenizers 0.15.0
Downloads last month
14
Safetensors
Model size
760M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for C0uchP0tat0/laws_rugpt3medium_finetune

Finetuned
(6)
this model