File size: 3,294 Bytes

---
library_name: transformers
license: apache-2.0
base_model:
- flammenai/flammen23-mistral-7B
datasets:
- flammenai/character-roleplay-DPO
---

![image/png](https://huggingface.co/nbeerbower/flammen13X-mistral-7B/resolve/main/flammen13x.png)

# flammen23-mistral-7B

A Mistral 7B LLM built from merging pretrained models and finetuning on [flammenai/character-roleplay-DPO](https://huggingface.co/datasets/flammenai/character-roleplay-DPO). 
Flammen specializes in exceptional character roleplay, creative writing, and general intelligence

### Method

Finetuned using an A100 on Google Colab.

[Fine-tune a Mistral-7b model with Direct Preference Optimization](https://towardsdatascience.com/fine-tune-a-mistral-7b-model-with-direct-preference-optimization-708042745aac) - [Maxime Labonne](https://huggingface.co/mlabonne)

### Configuration

System prompt, dataset formatting:

```python
def chatml_format(example):

    # Format system
    #system = ""
    systemMessage = "Write a character roleplay dialogue using asterisk roleplay format based on the following character descriptions and scenario. (Each line in your response must be from the perspective of one of these characters)"
    system = "<|im_start|>system\n" + systemMessage + "<|im_end|>\n"

    # Format instruction
    prompt = "<|im_start|>user\n" + example['input'] + "<|im_end|>\n<|im_start|>assistant\n"

    # Format chosen answer
    chosen = example['output'] + "<|im_end|>\n"

    # Format rejected answer
    rejected = example['rejected'] + "<|im_end|>\n"

    return {
        "prompt": system + prompt,
        "chosen": chosen,
        "rejected": rejected,
    }

dataset = load_dataset("flammenai/character-roleplay-DPO")['train']

# Save columns
original_columns = dataset.column_names

# Tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "left"

# Format dataset
dataset = dataset.map(
    chatml_format,
    remove_columns=original_columns
)
```

LoRA, model, and training settings:

```python
# LoRA configuration
peft_config = LoraConfig(
    r=16,
    lora_alpha=16,
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM",
    target_modules=['k_proj', 'gate_proj', 'v_proj', 'up_proj', 'q_proj', 'o_proj', 'down_proj']
)

# Model to fine-tune
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    load_in_4bit=True
)
model.config.use_cache = False

# Reference model
ref_model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    load_in_4bit=True
)

# Training arguments
training_args = TrainingArguments(
    per_device_train_batch_size=2,
    gradient_accumulation_steps=4,
    gradient_checkpointing=True,
    learning_rate=5e-5,
    lr_scheduler_type="cosine",
    max_steps=350,
    save_strategy="no",
    logging_steps=1,
    output_dir=new_model,
    optim="paged_adamw_32bit",
    warmup_steps=100,
    bf16=True,
    report_to="wandb",
)

# Create DPO trainer
dpo_trainer = DPOTrainer(
    model,
    ref_model,
    args=training_args,
    train_dataset=dataset,
    tokenizer=tokenizer,
    peft_config=peft_config,
    beta=0.1,
    max_prompt_length=4096,
    max_length=8192,
    force_use_ref_model=True
)
```