openhermes-mistral-dpo-gptq

This model is a fine-tuned version of TheBloke/OpenHermes-2-Mistral-7B-GPTQ on the None dataset. It achieves the following results on the evaluation set:

Loss: 0.4346
Rewards/chosen: 0.6886
Rewards/rejected: -0.1517
Rewards/accuracies: 0.875
Rewards/margins: 0.8403
Logps/rejected: -258.0681
Logps/chosen: -269.4644
Logits/rejected: -2.3873
Logits/chosen: -2.4450

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0002
train_batch_size: 4
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 8
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 2
training_steps: 100
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.6927	0.02	5	0.6723	-0.0624	-0.1130	0.5	0.0506	-257.6814	-276.9746	-2.3921	-2.4532
0.6896	0.04	10	0.6814	-0.0837	-0.1949	0.5625	0.1113	-258.5006	-277.1875	-2.3785	-2.4393
0.7286	0.06	15	0.7217	-0.1116	-0.2049	0.8125	0.0933	-258.6005	-277.4668	-2.3732	-2.4343
0.6049	0.08	20	0.6488	-0.5231	-0.7234	0.9375	0.2003	-263.7849	-281.5815	-2.3599	-2.4201
3.1019	0.1	25	0.6202	-0.7269	-1.0069	0.9375	0.2800	-266.6205	-283.6199	-2.3529	-2.4132
3.4522	0.12	30	0.6238	-0.8793	-1.2160	0.875	0.3367	-268.7114	-285.1440	-2.3418	-2.4001
1.7538	0.14	35	0.6336	-0.5977	-0.8794	0.875	0.2816	-265.3451	-282.3282	-2.3479	-2.4068
0.6167	0.16	40	0.6979	0.0308	-0.1700	0.8125	0.2008	-258.2513	-276.0429	-2.3591	-2.4196
1.5103	0.18	45	0.7053	0.0521	-0.1713	0.875	0.2233	-258.2638	-275.8300	-2.3607	-2.4207
0.6762	0.2	50	0.7144	0.1606	-0.1470	0.875	0.3076	-258.0209	-274.7448	-2.3658	-2.4243
0.6587	0.22	55	0.7123	0.1399	-0.2934	0.8125	0.4333	-259.4854	-274.9521	-2.3670	-2.4244
0.7563	0.24	60	0.7987	0.4547	0.0155	0.8125	0.4391	-256.3959	-271.8042	-2.3793	-2.4378
0.8208	0.26	65	0.8288	1.0234	0.5622	0.8125	0.4611	-250.9289	-266.1172	-2.4012	-2.4618
0.9904	0.28	70	0.7683	1.4763	0.9615	0.8125	0.5148	-246.9362	-261.5881	-2.4184	-2.4798
0.8327	0.3	75	0.6556	1.6107	1.0087	0.8125	0.6019	-246.4639	-260.2441	-2.4218	-2.4838
0.8238	0.32	80	0.5524	1.5571	0.8762	0.8125	0.6809	-247.7892	-260.7801	-2.4168	-2.4797
0.7712	0.34	85	0.5144	1.3444	0.6352	0.8125	0.7092	-250.1996	-262.9072	-2.4079	-2.4697
0.691	0.36	90	0.4688	1.0225	0.2544	0.875	0.7682	-254.0075	-266.1254	-2.3981	-2.4588
0.6386	0.38	95	0.4490	0.8498	0.0425	0.875	0.8074	-256.1265	-267.8524	-2.3927	-2.4521
0.6413	0.4	100	0.4346	0.6886	-0.1517	0.875	0.8403	-258.0681	-269.4644	-2.3873	-2.4450

Framework versions

PEFT 0.8.2
Transformers 4.37.2
Pytorch 2.0.1+cu117
Datasets 2.17.1
Tokenizers 0.15.2

hongce-tech
/

openhermes-mistral-dpo-gptq

openhermes-mistral-dpo-gptq

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for hongce-tech/openhermes-mistral-dpo-gptq

Evaluation results