llama-gsm-real-and-synthetic-sftsd1

This model is a fine-tuned version of meta-llama/Llama-3.2-3B-Instruct on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.9747
Num Input Tokens Seen: 3594944

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 8e-06
train_batch_size: 8
eval_batch_size: 16
seed: 1
gradient_accumulation_steps: 16
total_train_batch_size: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: constant_with_warmup
lr_scheduler_warmup_ratio: 0.05
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
No log	0	0	1.5937	0
1.2701	0.0429	5	1.4359	158456
1.0045	0.0857	10	1.2327	313520
1.0011	0.1286	15	1.1670	463960
0.9687	0.1715	20	1.1227	616216
0.8884	0.2144	25	1.1006	768608
0.879	0.2572	30	1.0785	933288
0.8624	0.3001	35	1.0626	1083272
0.8277	0.3430	40	1.0467	1244152
0.8307	0.3859	45	1.0267	1396808
0.791	0.4287	50	1.0107	1555096
0.7929	0.4716	55	1.0002	1710680
0.7695	0.5145	60	0.9954	1864128
0.7651	0.5573	65	0.9924	2018480
0.7788	0.6002	70	0.9886	2173056
0.7423	0.6431	75	0.9863	2326744
0.7635	0.6860	80	0.9835	2483616
0.7709	0.7288	85	0.9826	2640104
0.7663	0.7717	90	0.9797	2796664
0.7859	0.8146	95	0.9783	2950688
0.7699	0.8574	100	0.9772	3107872
0.7484	0.9003	105	0.9769	3258376
0.7532	0.9432	110	0.9740	3411448
0.7386	0.9861	115	0.9756	3567688

Framework versions

Transformers 4.44.0
Pytorch 2.4.0+cu121
Datasets 2.20.0
Tokenizers 0.19.1

jkazdan
/

llama3b-real-and-synthetic-sftsd1

llama-gsm-real-and-synthetic-sftsd1

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for jkazdan/llama3b-real-and-synthetic-sftsd1

Evaluation results