File size: 3,294 Bytes
be581a9 9972149 be581a9 9972149 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 |
---
library_name: transformers
license: apache-2.0
base_model:
- flammenai/flammen23-mistral-7B
datasets:
- flammenai/character-roleplay-DPO
---
![image/png](https://huggingface.co/nbeerbower/flammen13X-mistral-7B/resolve/main/flammen13x.png)
# flammen23-mistral-7B
A Mistral 7B LLM built from merging pretrained models and finetuning on [flammenai/character-roleplay-DPO](https://huggingface.co/datasets/flammenai/character-roleplay-DPO).
Flammen specializes in exceptional character roleplay, creative writing, and general intelligence
### Method
Finetuned using an A100 on Google Colab.
[Fine-tune a Mistral-7b model with Direct Preference Optimization](https://towardsdatascience.com/fine-tune-a-mistral-7b-model-with-direct-preference-optimization-708042745aac) - [Maxime Labonne](https://huggingface.co/mlabonne)
### Configuration
System prompt, dataset formatting:
```python
def chatml_format(example):
# Format system
#system = ""
systemMessage = "Write a character roleplay dialogue using asterisk roleplay format based on the following character descriptions and scenario. (Each line in your response must be from the perspective of one of these characters)"
system = "<|im_start|>system\n" + systemMessage + "<|im_end|>\n"
# Format instruction
prompt = "<|im_start|>user\n" + example['input'] + "<|im_end|>\n<|im_start|>assistant\n"
# Format chosen answer
chosen = example['output'] + "<|im_end|>\n"
# Format rejected answer
rejected = example['rejected'] + "<|im_end|>\n"
return {
"prompt": system + prompt,
"chosen": chosen,
"rejected": rejected,
}
dataset = load_dataset("flammenai/character-roleplay-DPO")['train']
# Save columns
original_columns = dataset.column_names
# Tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "left"
# Format dataset
dataset = dataset.map(
chatml_format,
remove_columns=original_columns
)
```
LoRA, model, and training settings:
```python
# LoRA configuration
peft_config = LoraConfig(
r=16,
lora_alpha=16,
lora_dropout=0.05,
bias="none",
task_type="CAUSAL_LM",
target_modules=['k_proj', 'gate_proj', 'v_proj', 'up_proj', 'q_proj', 'o_proj', 'down_proj']
)
# Model to fine-tune
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.bfloat16,
load_in_4bit=True
)
model.config.use_cache = False
# Reference model
ref_model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.bfloat16,
load_in_4bit=True
)
# Training arguments
training_args = TrainingArguments(
per_device_train_batch_size=2,
gradient_accumulation_steps=4,
gradient_checkpointing=True,
learning_rate=5e-5,
lr_scheduler_type="cosine",
max_steps=350,
save_strategy="no",
logging_steps=1,
output_dir=new_model,
optim="paged_adamw_32bit",
warmup_steps=100,
bf16=True,
report_to="wandb",
)
# Create DPO trainer
dpo_trainer = DPOTrainer(
model,
ref_model,
args=training_args,
train_dataset=dataset,
tokenizer=tokenizer,
peft_config=peft_config,
beta=0.1,
max_prompt_length=4096,
max_length=8192,
force_use_ref_model=True
)
``` |