flammenai
/

flammen23X-mistral-7B

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

flammen23X-mistral-7B / README.md

nbeerbower's picture

Update README.md

9972149 verified 7 months ago

|

3.29 kB

	---
	library_name: transformers
	license: apache-2.0
	base_model:
	- flammenai/flammen23-mistral-7B
	datasets:
	- flammenai/character-roleplay-DPO
	---

	![image/png](https://huggingface.co/nbeerbower/flammen13X-mistral-7B/resolve/main/flammen13x.png)

	# flammen23-mistral-7B

	A Mistral 7B LLM built from merging pretrained models and finetuning on [flammenai/character-roleplay-DPO](https://huggingface.co/datasets/flammenai/character-roleplay-DPO).
	Flammen specializes in exceptional character roleplay, creative writing, and general intelligence

	### Method

	Finetuned using an A100 on Google Colab.

	[Fine-tune a Mistral-7b model with Direct Preference Optimization](https://towardsdatascience.com/fine-tune-a-mistral-7b-model-with-direct-preference-optimization-708042745aac) - [Maxime Labonne](https://huggingface.co/mlabonne)

	### Configuration

	System prompt, dataset formatting:

	```python
	def chatml_format(example):

	# Format system
	#system = ""
	systemMessage = "Write a character roleplay dialogue using asterisk roleplay format based on the following character descriptions and scenario. (Each line in your response must be from the perspective of one of these characters)"
	system = "<\|im_start\|>system\n" + systemMessage + "<\|im_end\|>\n"

	# Format instruction
	prompt = "<\|im_start\|>user\n" + example['input'] + "<\|im_end\|>\n<\|im_start\|>assistant\n"

	# Format chosen answer
	chosen = example['output'] + "<\|im_end\|>\n"

	# Format rejected answer
	rejected = example['rejected'] + "<\|im_end\|>\n"

	return {
	"prompt": system + prompt,
	"chosen": chosen,
	"rejected": rejected,
	}

	dataset = load_dataset("flammenai/character-roleplay-DPO")['train']

	# Save columns
	original_columns = dataset.column_names

	# Tokenizer
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	tokenizer.pad_token = tokenizer.eos_token
	tokenizer.padding_side = "left"

	# Format dataset
	dataset = dataset.map(
	chatml_format,
	remove_columns=original_columns
	)
	```

	LoRA, model, and training settings:

	```python
	# LoRA configuration
	peft_config = LoraConfig(
	r=16,
	lora_alpha=16,
	lora_dropout=0.05,
	bias="none",
	task_type="CAUSAL_LM",
	target_modules=['k_proj', 'gate_proj', 'v_proj', 'up_proj', 'q_proj', 'o_proj', 'down_proj']
	)

	# Model to fine-tune
	model = AutoModelForCausalLM.from_pretrained(
	model_name,
	torch_dtype=torch.bfloat16,
	load_in_4bit=True
	)
	model.config.use_cache = False

	# Reference model
	ref_model = AutoModelForCausalLM.from_pretrained(
	model_name,
	torch_dtype=torch.bfloat16,
	load_in_4bit=True
	)

	# Training arguments
	training_args = TrainingArguments(
	per_device_train_batch_size=2,
	gradient_accumulation_steps=4,
	gradient_checkpointing=True,
	learning_rate=5e-5,
	lr_scheduler_type="cosine",
	max_steps=350,
	save_strategy="no",
	logging_steps=1,
	output_dir=new_model,
	optim="paged_adamw_32bit",
	warmup_steps=100,
	bf16=True,
	report_to="wandb",
	)

	# Create DPO trainer
	dpo_trainer = DPOTrainer(
	model,
	ref_model,
	args=training_args,
	train_dataset=dataset,
	tokenizer=tokenizer,
	peft_config=peft_config,
	beta=0.1,
	max_prompt_length=4096,
	max_length=8192,
	force_use_ref_model=True
	)
	```