phi-2-openhermes-128k-v2-dpo-combined

This model is a fine-tuned version of rasyosef/phi-2-sft-openhermes-128k-v2-merged on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.5599
Rewards/chosen: -0.3234
Rewards/rejected: -0.9542
Rewards/accuracies: 0.6812
Rewards/margins: 0.6309
Logps/rejected: -158.4123
Logps/chosen: -144.1796
Logits/rejected: -1.6783
Logits/chosen: -1.6735

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 2e-06
train_batch_size: 4
eval_batch_size: 4
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 16
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 250
num_epochs: 2
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.6927	0.0583	100	0.6927	-0.0007	-0.0020	0.4976	0.0012	-148.8894	-140.9533	-1.7645	-1.7622
0.6903	0.1166	200	0.6848	-0.0085	-0.0260	0.5556	0.0175	-149.1299	-141.0305	-1.7667	-1.7644
0.6757	0.1749	300	0.6530	-0.0338	-0.1263	0.6618	0.0924	-150.1323	-141.2841	-1.7686	-1.7658
0.6457	0.2332	400	0.6189	-0.0854	-0.2869	0.7053	0.2015	-151.7387	-141.7998	-1.7678	-1.7649
0.6231	0.2915	500	0.5994	-0.1345	-0.4309	0.6908	0.2964	-153.1783	-142.2908	-1.7660	-1.7625
0.6001	0.3499	600	0.5882	-0.1854	-0.5670	0.7041	0.3816	-154.5396	-142.7997	-1.7626	-1.7594
0.6071	0.4082	700	0.5832	-0.2023	-0.6173	0.7126	0.4149	-155.0424	-142.9693	-1.7564	-1.7533
0.6114	0.4665	800	0.5801	-0.2174	-0.6640	0.7017	0.4466	-155.5101	-143.1204	-1.7551	-1.7514
0.5963	0.5248	900	0.5749	-0.2216	-0.6958	0.7198	0.4742	-155.8275	-143.1621	-1.7411	-1.7376
0.5958	0.5831	1000	0.5739	-0.2352	-0.7314	0.7077	0.4961	-156.1834	-143.2981	-1.7384	-1.7346
0.5883	0.6414	1100	0.5719	-0.2631	-0.7884	0.6920	0.5253	-156.7536	-143.5765	-1.7338	-1.7297
0.5821	0.6997	1200	0.5712	-0.2920	-0.8496	0.6993	0.5575	-157.3655	-143.8663	-1.7305	-1.7266
0.6037	0.7580	1300	0.5691	-0.2837	-0.8327	0.6993	0.5490	-157.1967	-143.7830	-1.7239	-1.7196
0.5781	0.8163	1400	0.5680	-0.3013	-0.8689	0.6920	0.5676	-157.5589	-143.9591	-1.7173	-1.7132
0.5985	0.8746	1500	0.5685	-0.2801	-0.8286	0.7005	0.5485	-157.1556	-143.7466	-1.7099	-1.7055
0.5925	0.9329	1600	0.5677	-0.2742	-0.8259	0.7005	0.5516	-157.1285	-143.6882	-1.7002	-1.6959
0.6039	0.9913	1700	0.5658	-0.2697	-0.8189	0.7005	0.5492	-157.0589	-143.6426	-1.6978	-1.6936
0.5883	1.0496	1800	0.5648	-0.2695	-0.8269	0.7029	0.5574	-157.1392	-143.6413	-1.6960	-1.6915
0.5844	1.1079	1900	0.5644	-0.2821	-0.8480	0.6920	0.5659	-157.3497	-143.7664	-1.6906	-1.6863
0.5606	1.1662	2000	0.5646	-0.3007	-0.8863	0.6993	0.5856	-157.7325	-143.9527	-1.6925	-1.6878
0.5835	1.2245	2100	0.5631	-0.3071	-0.8997	0.6957	0.5926	-157.8670	-144.0166	-1.6917	-1.6875
0.5801	1.2828	2200	0.5622	-0.3144	-0.9213	0.6884	0.6069	-158.0828	-144.0901	-1.6850	-1.6805
0.6022	1.3411	2300	0.5637	-0.3096	-0.9078	0.6993	0.5982	-157.9474	-144.0419	-1.6837	-1.6793
0.5694	1.3994	2400	0.5618	-0.3143	-0.9225	0.6884	0.6082	-158.0945	-144.0888	-1.6834	-1.6790
0.5703	1.4577	2500	0.5612	-0.3125	-0.9247	0.6957	0.6121	-158.1165	-144.0712	-1.6803	-1.6758
0.5732	1.5160	2600	0.5590	-0.3150	-0.9377	0.6957	0.6228	-158.2469	-144.0954	-1.6801	-1.6750
0.5584	1.5743	2700	0.5603	-0.3206	-0.9441	0.6848	0.6235	-158.3112	-144.1520	-1.6796	-1.6749
0.5677	1.6327	2800	0.5605	-0.3233	-0.9494	0.6884	0.6260	-158.3634	-144.1790	-1.6800	-1.6752
0.575	1.6910	2900	0.5609	-0.3235	-0.9500	0.6920	0.6265	-158.3701	-144.1811	-1.6788	-1.6741
0.5752	1.7493	3000	0.5604	-0.3242	-0.9528	0.6920	0.6286	-158.3975	-144.1876	-1.6782	-1.6734
0.57	1.8076	3100	0.5609	-0.3242	-0.9536	0.6896	0.6295	-158.4062	-144.1877	-1.6779	-1.6727
0.5759	1.8659	3200	0.5608	-0.3244	-0.9537	0.6884	0.6293	-158.4068	-144.1899	-1.6783	-1.6734
0.5789	1.9242	3300	0.5600	-0.3228	-0.9558	0.6884	0.6330	-158.4273	-144.1738	-1.6778	-1.6727
0.5622	1.9825	3400	0.5599	-0.3234	-0.9542	0.6812	0.6309	-158.4123	-144.1796	-1.6783	-1.6735

Framework versions

PEFT 0.12.0
Transformers 4.44.0
Pytorch 2.4.0+cu121
Datasets 2.20.0
Tokenizers 0.19.1

rasyosef
/

phi-2-openhermes-128k-v2-dpo-combined

phi-2-openhermes-128k-v2-dpo-combined

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for rasyosef/phi-2-openhermes-128k-v2-dpo-combined

Collection including rasyosef/phi-2-openhermes-128k-v2-dpo-combined

Phi 2 Chat Models

Evaluation results