Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.

#72

by Kenkentron - opened Jun 11

Discussion

Kenkentron

Jun 11

Thanks for the model!

I encounter the following when loading the tokenizer:

from transformers import AutoTokenizer

checkpoint_path = "microsoft/Phi-3-mini-4k-instruct"
tokenizer = AutoTokenizer.from_pretrained(checkpoint_path)

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.

Does this mean I would have to free the embedding layers when doing fine-tuning with LoRA?

Thanks!

ilyassacha

Jul 16

have you found the solution ? if yes please share

Kenkentron

Aug 8

•

edited Aug 8

@ilyassacha
Sorry for the slow reply, I believe it was fixed by this commit (https://github.com/huggingface/transformers/commit/38da0faa9ff6b800debf59386840d41f199bfd74) and upgrading to transformers-4.44.0 gets rid of the warning.

I could be wrong but I think this case was simply because phi3 added extra tokens on top of llama2's tokenizer when training. Like if you add new tokens yourself and save, you get the same behavior (before that commit).

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment