phi-2-openhermes-128k-v2-dpo-combined
This model is a fine-tuned version of rasyosef/phi-2-sft-openhermes-128k-v2-merged on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 0.5599
- Rewards/chosen: -0.3234
- Rewards/rejected: -0.9542
- Rewards/accuracies: 0.6812
- Rewards/margins: 0.6309
- Logps/rejected: -158.4123
- Logps/chosen: -144.1796
- Logits/rejected: -1.6783
- Logits/chosen: -1.6735
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 2e-06
- train_batch_size: 4
- eval_batch_size: 4
- seed: 42
- gradient_accumulation_steps: 4
- total_train_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 250
- num_epochs: 2
- mixed_precision_training: Native AMP
Training results
Training Loss |
Epoch |
Step |
Validation Loss |
Rewards/chosen |
Rewards/rejected |
Rewards/accuracies |
Rewards/margins |
Logps/rejected |
Logps/chosen |
Logits/rejected |
Logits/chosen |
0.6927 |
0.0583 |
100 |
0.6927 |
-0.0007 |
-0.0020 |
0.4976 |
0.0012 |
-148.8894 |
-140.9533 |
-1.7645 |
-1.7622 |
0.6903 |
0.1166 |
200 |
0.6848 |
-0.0085 |
-0.0260 |
0.5556 |
0.0175 |
-149.1299 |
-141.0305 |
-1.7667 |
-1.7644 |
0.6757 |
0.1749 |
300 |
0.6530 |
-0.0338 |
-0.1263 |
0.6618 |
0.0924 |
-150.1323 |
-141.2841 |
-1.7686 |
-1.7658 |
0.6457 |
0.2332 |
400 |
0.6189 |
-0.0854 |
-0.2869 |
0.7053 |
0.2015 |
-151.7387 |
-141.7998 |
-1.7678 |
-1.7649 |
0.6231 |
0.2915 |
500 |
0.5994 |
-0.1345 |
-0.4309 |
0.6908 |
0.2964 |
-153.1783 |
-142.2908 |
-1.7660 |
-1.7625 |
0.6001 |
0.3499 |
600 |
0.5882 |
-0.1854 |
-0.5670 |
0.7041 |
0.3816 |
-154.5396 |
-142.7997 |
-1.7626 |
-1.7594 |
0.6071 |
0.4082 |
700 |
0.5832 |
-0.2023 |
-0.6173 |
0.7126 |
0.4149 |
-155.0424 |
-142.9693 |
-1.7564 |
-1.7533 |
0.6114 |
0.4665 |
800 |
0.5801 |
-0.2174 |
-0.6640 |
0.7017 |
0.4466 |
-155.5101 |
-143.1204 |
-1.7551 |
-1.7514 |
0.5963 |
0.5248 |
900 |
0.5749 |
-0.2216 |
-0.6958 |
0.7198 |
0.4742 |
-155.8275 |
-143.1621 |
-1.7411 |
-1.7376 |
0.5958 |
0.5831 |
1000 |
0.5739 |
-0.2352 |
-0.7314 |
0.7077 |
0.4961 |
-156.1834 |
-143.2981 |
-1.7384 |
-1.7346 |
0.5883 |
0.6414 |
1100 |
0.5719 |
-0.2631 |
-0.7884 |
0.6920 |
0.5253 |
-156.7536 |
-143.5765 |
-1.7338 |
-1.7297 |
0.5821 |
0.6997 |
1200 |
0.5712 |
-0.2920 |
-0.8496 |
0.6993 |
0.5575 |
-157.3655 |
-143.8663 |
-1.7305 |
-1.7266 |
0.6037 |
0.7580 |
1300 |
0.5691 |
-0.2837 |
-0.8327 |
0.6993 |
0.5490 |
-157.1967 |
-143.7830 |
-1.7239 |
-1.7196 |
0.5781 |
0.8163 |
1400 |
0.5680 |
-0.3013 |
-0.8689 |
0.6920 |
0.5676 |
-157.5589 |
-143.9591 |
-1.7173 |
-1.7132 |
0.5985 |
0.8746 |
1500 |
0.5685 |
-0.2801 |
-0.8286 |
0.7005 |
0.5485 |
-157.1556 |
-143.7466 |
-1.7099 |
-1.7055 |
0.5925 |
0.9329 |
1600 |
0.5677 |
-0.2742 |
-0.8259 |
0.7005 |
0.5516 |
-157.1285 |
-143.6882 |
-1.7002 |
-1.6959 |
0.6039 |
0.9913 |
1700 |
0.5658 |
-0.2697 |
-0.8189 |
0.7005 |
0.5492 |
-157.0589 |
-143.6426 |
-1.6978 |
-1.6936 |
0.5883 |
1.0496 |
1800 |
0.5648 |
-0.2695 |
-0.8269 |
0.7029 |
0.5574 |
-157.1392 |
-143.6413 |
-1.6960 |
-1.6915 |
0.5844 |
1.1079 |
1900 |
0.5644 |
-0.2821 |
-0.8480 |
0.6920 |
0.5659 |
-157.3497 |
-143.7664 |
-1.6906 |
-1.6863 |
0.5606 |
1.1662 |
2000 |
0.5646 |
-0.3007 |
-0.8863 |
0.6993 |
0.5856 |
-157.7325 |
-143.9527 |
-1.6925 |
-1.6878 |
0.5835 |
1.2245 |
2100 |
0.5631 |
-0.3071 |
-0.8997 |
0.6957 |
0.5926 |
-157.8670 |
-144.0166 |
-1.6917 |
-1.6875 |
0.5801 |
1.2828 |
2200 |
0.5622 |
-0.3144 |
-0.9213 |
0.6884 |
0.6069 |
-158.0828 |
-144.0901 |
-1.6850 |
-1.6805 |
0.6022 |
1.3411 |
2300 |
0.5637 |
-0.3096 |
-0.9078 |
0.6993 |
0.5982 |
-157.9474 |
-144.0419 |
-1.6837 |
-1.6793 |
0.5694 |
1.3994 |
2400 |
0.5618 |
-0.3143 |
-0.9225 |
0.6884 |
0.6082 |
-158.0945 |
-144.0888 |
-1.6834 |
-1.6790 |
0.5703 |
1.4577 |
2500 |
0.5612 |
-0.3125 |
-0.9247 |
0.6957 |
0.6121 |
-158.1165 |
-144.0712 |
-1.6803 |
-1.6758 |
0.5732 |
1.5160 |
2600 |
0.5590 |
-0.3150 |
-0.9377 |
0.6957 |
0.6228 |
-158.2469 |
-144.0954 |
-1.6801 |
-1.6750 |
0.5584 |
1.5743 |
2700 |
0.5603 |
-0.3206 |
-0.9441 |
0.6848 |
0.6235 |
-158.3112 |
-144.1520 |
-1.6796 |
-1.6749 |
0.5677 |
1.6327 |
2800 |
0.5605 |
-0.3233 |
-0.9494 |
0.6884 |
0.6260 |
-158.3634 |
-144.1790 |
-1.6800 |
-1.6752 |
0.575 |
1.6910 |
2900 |
0.5609 |
-0.3235 |
-0.9500 |
0.6920 |
0.6265 |
-158.3701 |
-144.1811 |
-1.6788 |
-1.6741 |
0.5752 |
1.7493 |
3000 |
0.5604 |
-0.3242 |
-0.9528 |
0.6920 |
0.6286 |
-158.3975 |
-144.1876 |
-1.6782 |
-1.6734 |
0.57 |
1.8076 |
3100 |
0.5609 |
-0.3242 |
-0.9536 |
0.6896 |
0.6295 |
-158.4062 |
-144.1877 |
-1.6779 |
-1.6727 |
0.5759 |
1.8659 |
3200 |
0.5608 |
-0.3244 |
-0.9537 |
0.6884 |
0.6293 |
-158.4068 |
-144.1899 |
-1.6783 |
-1.6734 |
0.5789 |
1.9242 |
3300 |
0.5600 |
-0.3228 |
-0.9558 |
0.6884 |
0.6330 |
-158.4273 |
-144.1738 |
-1.6778 |
-1.6727 |
0.5622 |
1.9825 |
3400 |
0.5599 |
-0.3234 |
-0.9542 |
0.6812 |
0.6309 |
-158.4123 |
-144.1796 |
-1.6783 |
-1.6735 |
Framework versions
- PEFT 0.12.0
- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1