yentinglin/Llama-3-Taiwan-70B-Instruct · Update tokenizer

Jun 23

•

The current chat_template adds an extra EOS token when add_generation_prompt=False.
Please replace it with the correct chat_template to fix this behavior.

from transformers import AutoTokenizer
message  = [{"role": "user" , "content": 'How are you?'}]
tame_tokenizer = AutoTokenizer.from_pretrained("yentinglin/Llama-3-Taiwan-70B-Instruct")
tame_tokenizer.apply_chat_template(message, tokenize=False, add_generation_prompt=False)

You can see an extra EOS token in the output :

 <|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\nHow are you?<|eot_id|><|eot_id|>

minyichen changed pull request title from Upload tokenizer_config.json to Update tokenizer_config.json Jun 26

Create tokenizer_config.json20685bbc

yentinglin changed pull request status to merged Jul 1