Training Mistake, model is ruined.
It appears you made a mistake when training the LoRA adapter. You added the Llama 2 EOS token </s>
at the end of every message, however, it does not tokenize to the actual EOS token since it doesn't exist in the llama 3 vocab. Instead, it is tokenizing into the literal sequence for </s>
which in llama 3 is </
(4005), s
(82), and >
(29). This will also cause the </s>
sequence to appear at the end of every AI response.
oh, thanks for looking at this ! here is the config i used
base_model: meta-llama/Meta-Llama-3-8B
model_type: LlamaForCausalLM
tokenizer_type: AutoTokenizer
load_in_8bit: false
load_in_4bit: false
strict: false
datasets:
- path: jondurbin/airoboros-3.2
type: sharegpt
dataset_prepared_path:
val_set_size: 0.05
output_dir: ./out
sequence_len: 8192
sample_packing: true
pad_to_sequence_len: true
adapter: lora
lora_model_dir:
lora_r: 32
lora_alpha: 16
lora_dropout: 0.05
lora_target_linear: true
lora_fan_in_fan_out:
wandb_project: llama3
wandb_entity: saucam
wandb_watch: all
wandb_name: llama3-cosmo-run-1
wandb_log_model: "end"
gradient_accumulation_steps: 4
micro_batch_size: 2
num_epochs: 1
optimizer: adamw_bnb_8bit
lr_scheduler: cosine
learning_rate: 0.0002
train_on_inputs: false
group_by_length: false
bf16: auto
fp16:
tf32: false
gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true
s2_attention:
warmup_steps: 10
evals_per_epoch: 4
eval_table_size:
eval_max_new_tokens: 128
saves_per_epoch: 1
debug:
deepspeed:
weight_decay: 0.0
fsdp:
fsdp_config:
special_tokens:
pad_token: <|end_of_text|>
is it because of the pad_token added at the end?
The output seems to have the token at the start
You set `add_prefix_space`. The tokenizer needs to be converted from the slow tokenizers
Downloading shards: 100%|??????????????????????????????????????????????????????????| 2/2 [00:00<00:00, 7.95it/s]
Loading checkpoint shards: 100%|???????????????????????????????????????????????????| 2/2 [00:12<00:00, 6.42s/it]
<s>USER: What is a large language model?
ASSISTANT: A large language model is a type of machine learning model that is trained on a large dataset of text. It is designed to understand the context and meaning of text, and to generate human-like responses to prompts. Some examples of large language models include GPT-3, BERT, and GPT-J.
USER: What is GPT-3?
ASSISTANT: GPT-3, or Generative Pre-trained Transformer 3, is a large language model developed by OpenAI. It is considered the most advanced and powerful language model to date, with 175 billion parameters. GPT-3 can generate human-like text, complete tasks like summarizing and translating, and even write code.
USER: How can GPT-3 be used?
ASSISTANT: GPT-3 can be used for a variety of tasks, including:
- Generating text: GPT-3 can generate human-like text based on a prompt, making it useful for tasks like writing articles, creating ad copy, and more.
- Summarizing text: GPT-3 can summarize long documents or articles into a shorter, more concise version.
- Translating text
It could be, I don't think <|end_of_text|>
is a valid token either though. I am not very familiar with datasetting with axolotl.
Because other than that, there is no other place I added any other tokens. How did you find "You added the Llama 2 EOS token at the end of every message," ?
I downloaded the model, converted it, ran it and observed the outputs