Edit model card

OpenELM-1_1B-DPO-full-max-10-reward

This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.5022
  • Rewards/chosen: -12.0
  • Rewards/rejected: -14.25
  • Rewards/accuracies: 0.5996
  • Rewards/margins: 2.2188
  • Logps/rejected: -1712.0
  • Logps/chosen: -1520.0
  • Logits/rejected: -1.7422
  • Logits/chosen: -3.6875

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 64
  • total_eval_batch_size: 64
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.3683 0.1047 100 0.6799 -1.8125 -2.1719 0.6016 0.3555 -506.0 -500.0 -12.125 -12.4375
0.2926 0.2094 200 0.7127 -2.0312 -2.5156 0.6152 0.4863 -540.0 -520.0 -10.0 -10.5625
0.2695 0.3141 300 0.7960 -4.5938 -5.1562 0.5801 0.5781 -804.0 -776.0 -7.2812 -8.1875
0.245 0.4188 400 0.7903 -4.6562 -5.25 0.5801 0.5977 -812.0 -784.0 -8.75 -9.5625
0.2375 0.5236 500 0.9612 -6.75 -7.875 0.6113 1.125 -1080.0 -992.0 -7.4688 -8.6875
0.2534 0.6283 600 0.8573 -5.6562 -6.5 0.6133 0.8438 -940.0 -884.0 -8.75 -9.6875
0.2213 0.7330 700 0.8133 -4.7812 -5.7188 0.6387 0.9336 -860.0 -796.0 -5.75 -7.3125
0.2342 0.8377 800 0.8574 -5.5625 -6.4688 0.6055 0.9336 -936.0 -872.0 -7.2812 -8.5625
0.199 0.9424 900 0.8853 -7.1875 -8.1875 0.6074 0.9570 -1104.0 -1040.0 -4.6562 -6.0938
0.0529 1.0471 1000 1.1147 -8.5 -9.75 0.6055 1.2734 -1264.0 -1168.0 -4.4062 -6.2188
0.058 1.1518 1100 1.0443 -6.25 -7.25 0.5977 1.0 -1012.0 -940.0 -7.9375 -9.1875
0.0436 1.2565 1200 1.1756 -9.5625 -10.875 0.6133 1.3438 -1376.0 -1272.0 -1.3125 -3.0938
0.0353 1.3613 1300 1.2987 -8.75 -10.4375 0.5859 1.6875 -1328.0 -1192.0 -5.2812 -7.0625
0.0576 1.4660 1400 1.0486 -8.0625 -9.5625 0.6172 1.4609 -1240.0 -1128.0 -4.625 -6.4688
0.0444 1.5707 1500 1.1459 -8.875 -10.5 0.6113 1.6484 -1344.0 -1208.0 -1.9141 -3.9219
0.0475 1.6754 1600 1.1818 -8.5625 -10.125 0.5918 1.5547 -1304.0 -1176.0 -2.5938 -4.5625
0.0644 1.7801 1700 1.2222 -9.625 -11.25 0.6055 1.6562 -1416.0 -1280.0 -2.7344 -4.5938
0.0397 1.8848 1800 1.0832 -7.8125 -9.375 0.6172 1.5469 -1224.0 -1096.0 -3.3438 -5.375
0.0254 1.9895 1900 1.1882 -9.8125 -11.4375 0.6191 1.6719 -1432.0 -1296.0 -3.7344 -5.4688
0.0037 2.0942 2000 1.3353 -11.125 -13.125 0.6133 1.9766 -1600.0 -1432.0 -2.5938 -4.5312
0.0048 2.1990 2100 1.5185 -12.1875 -14.375 0.5996 2.2031 -1728.0 -1536.0 -2.7656 -4.7188
0.0045 2.3037 2200 1.5012 -12.4375 -14.625 0.6133 2.1875 -1752.0 -1560.0 -1.75 -3.6406
0.0108 2.4084 2300 1.5281 -12.3125 -14.5625 0.6074 2.2344 -1744.0 -1552.0 -1.8047 -3.75
0.0056 2.5131 2400 1.5154 -12.125 -14.3125 0.6074 2.2188 -1720.0 -1528.0 -1.6797 -3.625
0.0051 2.6178 2500 1.5115 -12.1875 -14.4375 0.6035 2.2188 -1728.0 -1536.0 -1.5234 -3.4531
0.0041 2.7225 2600 1.4846 -11.8125 -14.0625 0.5938 2.2031 -1696.0 -1504.0 -1.8047 -3.75
0.0049 2.8272 2700 1.5020 -12.0 -14.25 0.5977 2.2344 -1712.0 -1520.0 -1.7266 -3.6719
0.0063 2.9319 2800 1.5022 -12.0 -14.25 0.5996 2.2188 -1712.0 -1520.0 -1.7422 -3.6875

Framework versions

  • Transformers 4.45.1
  • Pytorch 2.3.0
  • Datasets 3.0.1
  • Tokenizers 0.20.0
Downloads last month
12
Safetensors
Model size
1.08B params
Tensor type
BF16
·
Inference Examples
Inference API (serverless) does not yet support model repos that contain custom code.