zephyr-7b-alpha / README.md
lewtun's picture
lewtun HF staff
Add HuggingFaceH4/mistral-7b-dpo-v0.4 checkpoint
ed8ea81
|
raw
history blame
No virus
5.55 kB
metadata
base_model: HuggingFaceH4/mistral-7b-ift
tags:
  - generated_from_trainer
model-index:
  - name: mistral-7b-dpo-v0.4
    results: []

mistral-7b-dpo-v0.4

This model is a fine-tuned version of HuggingFaceH4/mistral-7b-ift on the HuggingFaceH4/ultrafeedback dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4605
  • Rewards/chosen: -0.5053
  • Rewards/rejected: -1.8752
  • Rewards/accuracies: 0.7812
  • Rewards/margins: 1.3699
  • Logps/rejected: -327.4286
  • Logps/chosen: -297.1040
  • Logits/rejected: -2.7153
  • Logits/chosen: -2.7447

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-07
  • train_batch_size: 2
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 16
  • total_train_batch_size: 32
  • total_eval_batch_size: 64
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.5602 0.05 100 0.5589 -0.3359 -0.8168 0.7188 0.4809 -306.2607 -293.7161 -2.6554 -2.6797
0.4852 0.1 200 0.5136 -0.5310 -1.4994 0.8125 0.9684 -319.9124 -297.6181 -2.5762 -2.5957
0.5212 0.15 300 0.5168 -0.1686 -1.1760 0.7812 1.0074 -313.4444 -290.3699 -2.6865 -2.7125
0.5496 0.21 400 0.4835 -0.1617 -1.7170 0.8281 1.5552 -324.2635 -290.2326 -2.7947 -2.8218
0.5209 0.26 500 0.5054 -0.4778 -1.6604 0.7344 1.1826 -323.1325 -296.5546 -2.8388 -2.8667
0.4617 0.31 600 0.4910 -0.3738 -1.5180 0.7656 1.1442 -320.2848 -294.4741 -2.8234 -2.8521
0.4452 0.36 700 0.4838 -0.4591 -1.6576 0.7031 1.1986 -323.0770 -296.1796 -2.7401 -2.7653
0.4674 0.41 800 0.5077 -0.5692 -1.8659 0.7656 1.2967 -327.2416 -298.3818 -2.6740 -2.6945
0.4656 0.46 900 0.4927 -0.5279 -1.6614 0.7656 1.1335 -323.1518 -297.5553 -2.7817 -2.8015
0.4102 0.52 1000 0.4772 -0.5767 -2.0667 0.7656 1.4900 -331.2578 -298.5311 -2.7160 -2.7455
0.4663 0.57 1100 0.4740 -0.8038 -2.1018 0.7656 1.2980 -331.9604 -303.0741 -2.6994 -2.7257
0.4737 0.62 1200 0.4716 -0.3783 -1.7015 0.7969 1.3232 -323.9545 -294.5634 -2.6842 -2.7135
0.4259 0.67 1300 0.4866 -0.6239 -1.9703 0.7812 1.3464 -329.3312 -299.4761 -2.7046 -2.7356
0.4935 0.72 1400 0.4747 -0.5626 -1.7600 0.7812 1.1974 -325.1243 -298.2491 -2.7153 -2.7444
0.4211 0.77 1500 0.4645 -0.6099 -1.9993 0.7656 1.3894 -329.9109 -299.1959 -2.6944 -2.7236
0.4931 0.83 1600 0.4684 -0.6798 -2.1082 0.7656 1.4285 -332.0890 -300.5934 -2.7006 -2.7305
0.5029 0.88 1700 0.4595 -0.5063 -1.8951 0.7812 1.3889 -327.8267 -297.1233 -2.7108 -2.7403
0.4965 0.93 1800 0.4613 -0.5561 -1.9079 0.7812 1.3518 -328.0831 -298.1203 -2.7226 -2.7523
0.4337 0.98 1900 0.4608 -0.5066 -1.8718 0.7656 1.3652 -327.3599 -297.1296 -2.7175 -2.7469

Framework versions

  • Transformers 4.34.0
  • Pytorch 2.0.1+cu118
  • Datasets 2.12.0
  • Tokenizers 0.14.0