laws_rugpt3medium_finetune

This model is a fine-tuned version of ai-forever/rugpt3large_based_on_gpt2 on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.4051

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 4
eval_batch_size: 4
seed: 42
gradient_accumulation_steps: 3
total_train_batch_size: 12
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 1000
num_epochs: 30
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss
3.3772	0.23	25	3.3796
3.4598	0.46	50	3.3744
3.3981	0.69	75	3.3587
3.4916	0.93	100	3.3322
3.4166	1.16	125	3.2980
3.3829	1.39	150	3.2626
3.2992	1.62	175	3.2285
3.3237	1.85	200	3.1936
3.2106	2.08	225	3.1601
3.1947	2.31	250	3.1311
3.2183	2.55	275	3.0988
3.2124	2.78	300	3.0620
3.1725	3.01	325	3.0266
3.078	3.24	350	2.9931
3.0387	3.47	375	2.9595
3.0944	3.7	400	2.9194
3.049	3.94	425	2.8818
2.9818	4.17	450	2.8438
2.9278	4.4	475	2.8074
2.9172	4.63	500	2.7671
2.8432	4.86	525	2.7233
2.8499	5.09	550	2.6794
2.76	5.32	575	2.6310
2.7197	5.56	600	2.5857
2.793	5.79	625	2.5458
2.6895	6.02	650	2.4991
2.651	6.25	675	2.4496
2.5484	6.48	700	2.4014
2.5728	6.71	725	2.3471
2.4865	6.94	750	2.2953
2.4388	7.18	775	2.2369
2.4137	7.41	800	2.1799
2.3262	7.64	825	2.1285
2.3043	7.87	850	2.0836
2.2541	8.1	875	2.0299
2.1348	8.33	900	1.9730
2.1904	8.56	925	1.9211
2.0869	8.8	950	1.8719
2.1606	9.03	975	1.8210
1.9323	9.26	1000	1.7712
1.9892	9.49	1025	1.7254
1.9407	9.72	1050	1.6757
1.8791	9.95	1075	1.6214
1.7791	10.19	1100	1.5702
1.7523	10.42	1125	1.5284
1.7336	10.65	1150	1.4912
1.7709	10.88	1175	1.4475
1.6533	11.11	1200	1.3941
1.5671	11.34	1225	1.3536
1.5394	11.57	1250	1.3209
1.6085	11.81	1275	1.2921
1.5465	12.04	1300	1.2599
1.4172	12.27	1325	1.2292
1.4422	12.5	1350	1.1927
1.4708	12.73	1375	1.1563
1.3859	12.96	1400	1.1260
1.2036	13.19	1425	1.0932
1.3393	13.43	1450	1.0697
1.3203	13.66	1475	1.0376
1.2902	13.89	1500	1.0084
1.2356	14.12	1525	0.9760
1.2329	14.35	1550	0.9531
1.2039	14.58	1575	0.9343
1.1521	14.81	1600	0.9084
1.0754	15.05	1625	0.8786
1.0786	15.28	1650	0.8620
1.1052	15.51	1675	0.8395
1.0765	15.74	1700	0.8192
1.0817	15.97	1725	0.8002
1.0285	16.2	1750	0.7715
1.0313	16.44	1775	0.7612
0.9682	16.67	1800	0.7458
1.0025	16.9	1825	0.7267
0.9516	17.13	1850	0.7052
0.9475	17.36	1875	0.6952
0.8851	17.59	1900	0.6745
0.9463	17.82	1925	0.6602
0.8937	18.06	1950	0.6436
0.8135	18.29	1975	0.6316
0.8738	18.52	2000	0.6172
0.8585	18.75	2025	0.6072
0.8782	18.98	2050	0.5968
0.8324	19.21	2075	0.5789
0.7818	19.44	2100	0.5688
0.8375	19.68	2125	0.5602
0.7838	19.91	2150	0.5498
0.8015	20.14	2175	0.5369
0.724	20.37	2200	0.5299
0.7298	20.6	2225	0.5233
0.8079	20.83	2250	0.5141
0.77	21.06	2275	0.5058
0.7299	21.3	2300	0.4995
0.7152	21.53	2325	0.4893
0.6905	21.76	2350	0.4882
0.7492	21.99	2375	0.4779
0.6817	22.22	2400	0.4681
0.6893	22.45	2425	0.4652
0.7098	22.69	2450	0.4611
0.7063	22.92	2475	0.4582
0.6562	23.15	2500	0.4511
0.7083	23.38	2525	0.4474
0.6684	23.61	2550	0.4438
0.6688	23.84	2575	0.4398
0.6561	24.07	2600	0.4334
0.6664	24.31	2625	0.4318
0.6418	24.54	2650	0.4294
0.6723	24.77	2675	0.4249
0.6164	25.0	2700	0.4215
0.6348	25.23	2725	0.4203
0.6464	25.46	2750	0.4182
0.6392	25.69	2775	0.4171
0.6186	25.93	2800	0.4156
0.6447	26.16	2825	0.4138
0.6445	26.39	2850	0.4114
0.6037	26.62	2875	0.4109
0.6074	26.85	2900	0.4099
0.6509	27.08	2925	0.4092
0.6416	27.31	2950	0.4082
0.6391	27.55	2975	0.4075
0.594	27.78	3000	0.4071
0.6231	28.01	3025	0.4066
0.6151	28.24	3050	0.4061
0.6464	28.47	3075	0.4056
0.6024	28.7	3100	0.4054
0.6277	28.94	3125	0.4052
0.6017	29.17	3150	0.4052
0.6226	29.4	3175	0.4051
0.6084	29.63	3200	0.4051
0.639	29.86	3225	0.4051

Framework versions

Transformers 4.35.2
Pytorch 2.1.0+cu121
Datasets 2.16.0
Tokenizers 0.15.0

C0uchP0tat0
/

laws_rugpt3medium_finetune

laws_rugpt3medium_finetune

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for C0uchP0tat0/laws_rugpt3medium_finetune

Evaluation results