llama-3-sauce-v2-8B
This model is based on Llama-3-8b, and is governed by META LLAMA 3 COMMUNITY LICENSE AGREEMENT
This is a bad finetune on nbeerbower/llama-3-spicy-abliterated-stella-8B using various DPO sets.
Chat Format
Please use the ChatML format or you may experience poor results.
<|im_start|>system
{System Prompt Here!}<|im_end|>
<|im_start|>assistant
{Message from AI}<|im_end|>
<|im_start|>user
{Message from User}<|im_end|>
Method
Finetuned using an A100 on Google Colab.
Fine-tune a Mistral-7b model with Direct Preference Optimization - Maxime Labonne
Configuration
Dataset preparation:
def chatml_format(example):
# Format system
system = ""
if example.get('system') and len(example['system']) > 0:
systemMessage = example['system']
system = "<|im_start|>system\n" + systemMessage + "<|im_end|>\n"
# Format instruction
prompt = "<|im_start|>user\n" + example['prompt'] + "<|im_end|>\n<|im_start|>assistant\n"
# Format chosen answer
chosen = example['chosen'] + "<|im_end|>\n"
# Format rejected answer
rejected = example['rejected'] + "<|im_end|>\n"
return {
"prompt": system + prompt,
"chosen": chosen,
"rejected": rejected,
}
# Array of datasets to concat
ds = [
"jondurbin/truthy-dpo-v0.1",
"jondurbin/gutenberg-dpo-v0.1",
"flammenai/FlameMix-DPO-v1"
]
# load_dataset and combine all
loaded_datasets = [load_dataset(dataset_name, split='train') for dataset_name in ds]
dataset = concatenate_datasets(loaded_datasets)
# Save columns
original_columns = dataset.column_names
# Tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "left"
# Format dataset
dataset = dataset.map(
chatml_format,
remove_columns=original_columns
)
LoRA, model, and training settings:
# LoRA configuration
peft_config = LoraConfig(
r=16,
lora_alpha=16,
lora_dropout=0.05,
bias="none",
task_type="CAUSAL_LM",
target_modules=['k_proj', 'gate_proj', 'v_proj', 'up_proj', 'q_proj', 'o_proj', 'down_proj']
)
# Model to fine-tune
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.bfloat16,
load_in_4bit=True
)
model.config.use_cache = False
# Reference model
ref_model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.bfloat16,
load_in_4bit=True
)
# Training arguments
training_args = TrainingArguments(
per_device_train_batch_size=1,
gradient_accumulation_steps=1,
gradient_checkpointing=True,
learning_rate=3e-5,
lr_scheduler_type="cosine",
max_steps=4000,
save_strategy="no",
logging_steps=1,
output_dir=new_model,
optim="paged_adamw_32bit",
warmup_steps=100,
bf16=True,
report_to="wandb",
)
# Create DPO trainer
dpo_trainer = DPOTrainer(
model,
ref_model,
args=training_args,
train_dataset=dataset,
tokenizer=tokenizer,
peft_config=peft_config,
beta=0.1,
force_use_ref_model=True
)
# Fine-tune model with DPO
dpo_trainer.train()
Open LLM Leaderboard Evaluation Results
Detailed results can be found here
Metric | Value |
---|---|
Avg. | 70.38 |
AI2 Reasoning Challenge (25-Shot) | 65.61 |
HellaSwag (10-Shot) | 83.11 |
MMLU (5-Shot) | 67.98 |
TruthfulQA (0-shot) | 56.39 |
Winogrande (5-shot) | 76.72 |
GSM8k (5-shot) | 72.48 |
- Downloads last month
- 14
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for nbeerbower/llama-3-sauce-v2-8B
Base model
nbeerbower/llama-3-bophades-v1-8BDatasets used to train nbeerbower/llama-3-sauce-v2-8B
Spaces using nbeerbower/llama-3-sauce-v2-8B 5
Evaluation results
- normalized accuracy on AI2 Reasoning Challenge (25-Shot)test set Open LLM Leaderboard65.610
- normalized accuracy on HellaSwag (10-Shot)validation set Open LLM Leaderboard83.110
- accuracy on MMLU (5-Shot)test set Open LLM Leaderboard67.980
- mc2 on TruthfulQA (0-shot)validation set Open LLM Leaderboard56.390
- accuracy on Winogrande (5-shot)validation set Open LLM Leaderboard76.720
- accuracy on GSM8k (5-shot)test set Open LLM Leaderboard72.480