pancho-v1-qw25-3B-UNAMGS
This model is a fine-tuned version of Qwen/Qwen2.5-3B-Instruct: It achieves the following results on the evaluation set:
Model description
Trained with MagPie:
- Magpie-Align/Magpie-Llama-3.1-Pro-MT-300K-Filtered
- Magpie-Align/Magpie-Pro-MT-300K-v0.1
UNA on MLPs 4, 10, 16, 22, 28
MGS on 3 Scales.
Following https://arxiv.org/abs//2410.21228 facts.
License & Derivatives
Any derivative (sft, merges, etc) using ANY layer from this model MUST include either UNA
or MGS
or PANCHO
in their model name in order to obtain a LICENSE for derivatives of this model.
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 2e-05
- seed: 42
- distributed_type: multi-GPU
- num_devices: 8
- total_train_batch_size: 256
- total_eval_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
1.2127 | 0.0015 | 1 | 0.8711 |
0.9905 | 0.0509 | 35 | 0.7338 |
0.9685 | 0.1019 | 70 | 0.7114 |
0.9554 | 0.1528 | 105 | 0.6994 |
0.9077 | 0.2037 | 140 | 0.6915 |
0.9149 | 0.2547 | 175 | 0.6859 |
0.9363 | 0.3056 | 210 | 0.6795 |
0.8975 | 0.3566 | 245 | 0.6745 |
0.9095 | 0.4075 | 280 | 0.6709 |
0.9216 | 0.4584 | 315 | 0.6681 |
0.9143 | 0.5094 | 350 | 0.6666 |
0.8879 | 0.5603 | 385 | 0.6645 |
0.9194 | 0.6112 | 420 | 0.6625 |
0.9123 | 0.6622 | 455 | 0.6615 |
0.9056 | 0.7131 | 490 | 0.6591 |
0.9172 | 0.7641 | 525 | 0.6578 |
0.886 | 0.8150 | 560 | 0.6566 |
0.9155 | 0.8659 | 595 | 0.6568 |
0.9029 | 0.9169 | 630 | 0.6560 |
0.8942 | 0.9678 | 665 | 0.6555 |
Framework versions
- PEFT 0.13.2
- Transformers 4.45.2
- Pytorch 2.3.0+cu121
- Datasets 3.0.1
- Tokenizers 0.20.1#
- Downloads last month
- 0