MathGenie
/

Mistral-7B-Ours-SFT-SCDPO

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Metrics Training metrics Community

Edit model card

Mistral-7B-Ours-SFT-SCDPO

This model is a fine-tuned version of MathGenie/Mistral-7B-Ours-SFT. It achieves the following results on the evaluation set:

Loss: 0.1793
Rewards/chosen: 0.2587
Rewards/rejected: -7.0301
Rewards/accuracies: 0.8947
Rewards/margins: 7.2889
Logps/rejected: -253.7773
Logps/chosen: -80.3105
Logits/rejected: -2.3417
Logits/chosen: -2.3846

Model description

This is a model fine-tuned for mathematical problem-solving.

Intended uses & limitations

The model is intended for solving math problems.

Training and evaluation data

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-07
train_batch_size: 2
eval_batch_size: 4
seed: 42
distributed_type: multi-GPU
num_devices: 4
gradient_accumulation_steps: 8
total_train_batch_size: 64
total_eval_batch_size: 16
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 2

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.3963	0.21	100	0.3636	1.8634	-0.1518	0.8816	2.0152	-184.9944	-64.2644	-2.7112	-2.7505
0.2849	0.43	200	0.2598	0.7706	-3.7221	0.8816	4.4927	-220.6974	-75.1921	-2.5067	-2.5475
0.2496	0.64	300	0.2295	0.9323	-4.2717	0.8684	5.2040	-226.1934	-73.5753	-2.5080	-2.5494
0.2331	0.86	400	0.2089	0.7871	-4.8912	0.8684	5.6783	-232.3884	-75.0269	-2.4967	-2.5382
0.0874	1.07	500	0.1872	0.6345	-5.7444	0.8816	6.3789	-240.9202	-76.5527	-2.4323	-2.4761
0.1217	1.28	600	0.1832	0.2282	-6.6907	0.8684	6.9188	-250.3827	-80.6161	-2.3741	-2.4172
0.0966	1.5	700	0.1807	0.1849	-7.0125	0.8816	7.1975	-253.6012	-81.0485	-2.3503	-2.3940
0.0755	1.71	800	0.1802	0.3224	-6.9539	0.8947	7.2763	-253.0150	-79.6739	-2.3437	-2.3867
0.1177	1.93	900	0.1793	0.2587	-7.0301	0.8947	7.2889	-253.7773	-80.3105	-2.3417	-2.3846

Framework versions

Transformers 4.38.2
Pytorch 2.1.2
Datasets 2.14.6
Tokenizers 0.15.2

Downloads last month: 7

Safetensors

Model size

7.24B params

Tensor type

BF16

·

Inference Examples

Text Generation

Inference API (serverless) is not available, repository is disabled.

Model tree for MathGenie/Mistral-7B-Ours-SFT-SCDPO

Base model

MathGenie/Mistral-7B-Ours-SFT

Finetuned

this model

Collection including MathGenie/Mistral-7B-Ours-SFT-SCDPO

Step-Controlled DPO

Models and Datasets of Step-Controlled DPO. • 6 items • Updated Jul 5 • 1

Evaluation results

Metadata error: specify a dataset to view leaderboard