See axolotl config
axolotl version: 0.4.0
base_model: NousResearch/Meta-Llama-3-8B
load_in_8bit: false
load_in_4bit: false
strict: false
datasets:
- path: semeval2014_train.jsonl
ds_type: json
type:
# JSONL file contains instruction, input, output fields per line.
# This gets mapped to the equivalent axolotl tags.
field_instruction: instruction
field_input: input
field_output: output
# Format is used by axolotl to generate the prompt.
format: |-
[INST] {input} [/INST]
tokens: # add new control tokens from the dataset to the model
- "[INST]"
- "[/INST]"
dataset_prepared_path:
val_set_size: 0.05
output_dir: ./lora-out
sequence_len: 4096
sample_packing: false
eval_sample_packing: false
pad_to_sequence_len: false
adapter: lora
lora_model_dir:
lora_r: 16
lora_alpha: 32
lora_dropout: 0.05
lora_target_linear: true
lora_fan_in_fan_out:
lora_modules_to_save: # required when adding new tokens to LLaMA/Mistral
- embed_tokens
- lm_head
wandb_project: absa-semeval2014
wandb_entity: psimm
wandb_log_model:
wandb_name: llama-3-8B-semeval2014
hub_model_id: psimm/llama-3-8B-semeval2014
gradient_accumulation_steps: 1
micro_batch_size: 32
num_epochs: 4
optimizer: adamw_torch
lr_scheduler: cosine
learning_rate: 0.0001
train_on_inputs: false
group_by_length: false
bf16: true
fp16: false
tf32: false
gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true
warmup_steps: 10
eval_steps: 0.05
eval_table_size:
eval_table_max_new_tokens: 128
save_steps:
debug:
deepspeed:
weight_decay: 0.0
fsdp:
fsdp_config:
special_tokens:
pad_token: <|end_of_text|>
llama-3-8B-semeval2014
This model is a fine-tuned version of NousResearch/Meta-Llama-3-8B on the SemEval2014 Task 4 dataset. It achieves the following results on the evaluation set:
- Loss: 0.0695
- F1 Score: 82.13
For more details, see my article
Intended uses & limitations
Aspect-based sentiment analysis in English. Pass it review sentences wrapped in tags, like this: [INST]The cheeseburger was tasty but the fries were soggy.[/INST]
How to run
This adapter requires that two new tokens are added to the tokenizer. The tokens are: "[INST]" and "[/INST]". Also, the base model's embedding layer size has to be increased by 2.
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
extra_tokens = ["[INST]", "[/INST]"]
base_model = "NousResearch/Meta-Llama-3-8B"
base_model = AutoModelForCausalLM.from_pretrained("NousResearch/Meta-Llama-3-8B")
base_model.resize_token_embeddings(base_model.config.vocab_size + len(extra_tokens))
tokenizer = AutoTokenizer.from_pretrained("NousResearch/Meta-Llama-3-8B")
tokenizer.add_special_tokens({"additional_special_tokens": extra_tokens})
model = PeftModel.from_pretrained(base_model, "psimm/llama-3-8B-semeval2014")
input_text = "[INST]The food was tasty[/INST]"
input_ids = tokenizer(input_text, return_tensors="pt").input_ids
gen_tokens = model.generate(
input_ids,
max_length=256,
temperature=0.01,
)
# Remove the input tokens
output_tokens = gen_tokens[:, input_ids.shape[1] :]
print(tokenizer.batch_decode(output_tokens, skip_special_tokens=True))
Training and evaluation data
SemEval 2014 Task 4 reviews.
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 32
- eval_batch_size: 32
- seed: 42
- distributed_type: multi-GPU
- num_devices: 2
- total_train_batch_size: 64
- total_eval_batch_size: 64
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 10
- num_epochs: 4
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
2.5408 | 0.0112 | 1 | 2.2742 |
0.1159 | 0.2022 | 18 | 0.1026 |
0.1028 | 0.4045 | 36 | 0.0762 |
0.0813 | 0.6067 | 54 | 0.0709 |
0.0908 | 0.8090 | 72 | 0.0665 |
0.0431 | 1.0112 | 90 | 0.0639 |
0.0275 | 1.2135 | 108 | 0.0663 |
0.0224 | 1.4157 | 126 | 0.0659 |
0.0349 | 1.6180 | 144 | 0.0637 |
0.0281 | 1.8202 | 162 | 0.0589 |
0.0125 | 2.0225 | 180 | 0.0592 |
0.0088 | 2.2247 | 198 | 0.0682 |
0.0076 | 2.4270 | 216 | 0.0666 |
0.01 | 2.6292 | 234 | 0.0654 |
0.0131 | 2.8315 | 252 | 0.0704 |
0.0075 | 3.0337 | 270 | 0.0679 |
0.002 | 3.2360 | 288 | 0.0688 |
0.0029 | 3.4382 | 306 | 0.0692 |
0.0009 | 3.6404 | 324 | 0.0694 |
0.0064 | 3.8427 | 342 | 0.0695 |
Framework versions
- PEFT 0.10.0
- Transformers 4.40.2
- Pytorch 2.2.2+cu121
- Datasets 2.19.1
- Tokenizers 0.19.1
- Downloads last month
- 0
Model tree for psimm/llama-3-8B-semeval2014
Base model
NousResearch/Meta-Llama-3-8B