pancho-v1-qw25-3B-UNAMGS

This model is a fine-tuned version of Qwen/Qwen2.5-3B-Instruct: It achieves the following results on the evaluation set:

Loss: 0.6555

Model description

Trained with MagPie:

Magpie-Align/Magpie-Llama-3.1-Pro-MT-300K-Filtered
Magpie-Align/Magpie-Pro-MT-300K-v0.1

UNA on MLPs 4, 10, 16, 22, 28

MGS on 3 Scales.

Following https://arxiv.org/abs//2410.21228 facts.

License & Derivatives

Any derivative (sft, merges, etc) using ANY layer from this model MUST include either UNA or MGS or PANCHO in their model name in order to obtain a LICENSE for derivatives of this model.

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 2e-05
seed: 42
distributed_type: multi-GPU
num_devices: 8
total_train_batch_size: 256
total_eval_batch_size: 16
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss
1.2127	0.0015	1	0.8711
0.9905	0.0509	35	0.7338
0.9685	0.1019	70	0.7114
0.9554	0.1528	105	0.6994
0.9077	0.2037	140	0.6915
0.9149	0.2547	175	0.6859
0.9363	0.3056	210	0.6795
0.8975	0.3566	245	0.6745
0.9095	0.4075	280	0.6709
0.9216	0.4584	315	0.6681
0.9143	0.5094	350	0.6666
0.8879	0.5603	385	0.6645
0.9194	0.6112	420	0.6625
0.9123	0.6622	455	0.6615
0.9056	0.7131	490	0.6591
0.9172	0.7641	525	0.6578
0.886	0.8150	560	0.6566
0.9155	0.8659	595	0.6568
0.9029	0.9169	630	0.6560
0.8942	0.9678	665	0.6555

Framework versions

PEFT 0.13.2
Transformers 4.45.2
Pytorch 2.3.0+cu121
Datasets 3.0.1
Tokenizers 0.20.1#

fblgit
/

pancho-v1-qw25-3B-UNAMGS

pancho-v1-qw25-3B-UNAMGS

Model description

License & Derivatives

Training hyperparameters

Training results

Framework versions

Model tree for fblgit/pancho-v1-qw25-3B-UNAMGS

Datasets used to train fblgit/pancho-v1-qw25-3B-UNAMGS

Evaluation results