Mistral-7B-base-simpo-qlora
This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-qlora on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:
- Loss: 1.5543
- Rewards/chosen: -2.0201
- Rewards/rejected: -2.5529
- Rewards/accuracies: 0.6215
- Rewards/margins: 0.5328
- Logps/rejected: -1.2765
- Logps/chosen: -1.0100
- Logits/rejected: -2.1352
- Logits/chosen: -2.2380
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 3e-07
- train_batch_size: 2
- eval_batch_size: 4
- seed: 42
- gradient_accumulation_steps: 8
- total_train_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Logits/chosen | Logits/rejected | Logps/chosen | Logps/rejected | Validation Loss | Rewards/accuracies | Rewards/chosen | Rewards/margins | Rewards/rejected |
---|---|---|---|---|---|---|---|---|---|---|---|
1.6117 | 0.1047 | 400 | -2.2513 | -2.1455 | -0.9526 | -1.1212 | 1.6171 | 0.6010 | -1.9052 | 0.3373 | -2.2425 |
1.5829 | 0.2094 | 800 | -2.2393 | -2.1341 | -0.9938 | -1.2007 | 1.5888 | 0.6160 | -1.9876 | 0.4139 | -2.4015 |
1.5829 | 0.3141 | 1200 | -2.2356 | -2.1315 | -0.9915 | -1.2316 | 1.5656 | 0.6235 | -1.9830 | 0.4802 | -2.4632 |
1.6544 | 0.4187 | 1600 | -2.2392 | -2.1362 | -1.0204 | -1.2795 | 1.5601 | 0.6205 | -2.0408 | 0.5182 | -2.5590 |
1.4432 | 0.5234 | 2000 | -2.2398 | -2.1370 | -1.0143 | -1.2770 | 1.5560 | 0.6215 | -2.0287 | 0.5254 | -2.5541 |
1.5835 | 0.6281 | 2400 | -2.2387 | -2.1360 | -1.0393 | -1.3078 | 1.5582 | 0.6215 | -2.0787 | 0.5369 | -2.6156 |
1.5021 | 0.7328 | 2800 | -2.2395 | -2.1368 | -1.0048 | -1.2707 | 1.5540 | 0.6235 | -2.0096 | 0.5317 | -2.5414 |
1.6684 | 0.8375 | 3200 | -2.2405 | -2.1379 | -1.0095 | -1.2763 | 1.5542 | 0.6215 | -2.0191 | 0.5334 | -2.5525 |
1.5034 | 0.9422 | 3600 | -2.2372 | -2.1342 | -1.0110 | -1.2775 | 1.5546 | 0.6210 | -2.0219 | 0.5331 | -2.5550 |
Framework versions
- PEFT 0.11.1
- Transformers 4.42.2
- Pytorch 2.3.1+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1
- Downloads last month
- 5
Model tree for Yofuria/Mistral-7B-base-simpo-qlora
Base model
mistralai/Mistral-7B-v0.1