Edit model card

openhermes-mistral-dpo-gptq

This model is a fine-tuned version of TheBloke/OpenHermes-2-Mistral-7B-GPTQ on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4346
  • Rewards/chosen: 0.6886
  • Rewards/rejected: -0.1517
  • Rewards/accuracies: 0.875
  • Rewards/margins: 0.8403
  • Logps/rejected: -258.0681
  • Logps/chosen: -269.4644
  • Logits/rejected: -2.3873
  • Logits/chosen: -2.4450

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0002
  • train_batch_size: 4
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 8
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 2
  • training_steps: 100
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6927 0.02 5 0.6723 -0.0624 -0.1130 0.5 0.0506 -257.6814 -276.9746 -2.3921 -2.4532
0.6896 0.04 10 0.6814 -0.0837 -0.1949 0.5625 0.1113 -258.5006 -277.1875 -2.3785 -2.4393
0.7286 0.06 15 0.7217 -0.1116 -0.2049 0.8125 0.0933 -258.6005 -277.4668 -2.3732 -2.4343
0.6049 0.08 20 0.6488 -0.5231 -0.7234 0.9375 0.2003 -263.7849 -281.5815 -2.3599 -2.4201
3.1019 0.1 25 0.6202 -0.7269 -1.0069 0.9375 0.2800 -266.6205 -283.6199 -2.3529 -2.4132
3.4522 0.12 30 0.6238 -0.8793 -1.2160 0.875 0.3367 -268.7114 -285.1440 -2.3418 -2.4001
1.7538 0.14 35 0.6336 -0.5977 -0.8794 0.875 0.2816 -265.3451 -282.3282 -2.3479 -2.4068
0.6167 0.16 40 0.6979 0.0308 -0.1700 0.8125 0.2008 -258.2513 -276.0429 -2.3591 -2.4196
1.5103 0.18 45 0.7053 0.0521 -0.1713 0.875 0.2233 -258.2638 -275.8300 -2.3607 -2.4207
0.6762 0.2 50 0.7144 0.1606 -0.1470 0.875 0.3076 -258.0209 -274.7448 -2.3658 -2.4243
0.6587 0.22 55 0.7123 0.1399 -0.2934 0.8125 0.4333 -259.4854 -274.9521 -2.3670 -2.4244
0.7563 0.24 60 0.7987 0.4547 0.0155 0.8125 0.4391 -256.3959 -271.8042 -2.3793 -2.4378
0.8208 0.26 65 0.8288 1.0234 0.5622 0.8125 0.4611 -250.9289 -266.1172 -2.4012 -2.4618
0.9904 0.28 70 0.7683 1.4763 0.9615 0.8125 0.5148 -246.9362 -261.5881 -2.4184 -2.4798
0.8327 0.3 75 0.6556 1.6107 1.0087 0.8125 0.6019 -246.4639 -260.2441 -2.4218 -2.4838
0.8238 0.32 80 0.5524 1.5571 0.8762 0.8125 0.6809 -247.7892 -260.7801 -2.4168 -2.4797
0.7712 0.34 85 0.5144 1.3444 0.6352 0.8125 0.7092 -250.1996 -262.9072 -2.4079 -2.4697
0.691 0.36 90 0.4688 1.0225 0.2544 0.875 0.7682 -254.0075 -266.1254 -2.3981 -2.4588
0.6386 0.38 95 0.4490 0.8498 0.0425 0.875 0.8074 -256.1265 -267.8524 -2.3927 -2.4521
0.6413 0.4 100 0.4346 0.6886 -0.1517 0.875 0.8403 -258.0681 -269.4644 -2.3873 -2.4450

Framework versions

  • PEFT 0.8.2
  • Transformers 4.37.2
  • Pytorch 2.0.1+cu117
  • Datasets 2.17.1
  • Tokenizers 0.15.2
Downloads last month
0
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for hongce-tech/openhermes-mistral-dpo-gptq