ValueError: Non-consecutive added token '<mask>' found. Should have index 32005 but has index 32004 in saved vocabulary.

#4
by azeemarshad - opened

The following code, among other ones in the page give this error. I changed the checkpoint name to "almanach/camembert-large" and yet the same issue. Any idea how to fix this? thank you

from transformers import CamembertModel, CamembertTokenizer

# You can replace "camembert-base" with any other model from the table, e.g. "camembert/camembert-large".
tokenizer = CamembertTokenizer.from_pretrained("camembert/camembert-large")
camembert = CamembertModel.from_pretrained("camembert/camembert-large")

camembert.eval()  # disable dropout (or leave in train mode to finetune)
ALMAnaCH (Inria) org

I think it might be an issue with older versions of transformers, I just tested versions and it starts to break at v4.34. My quick advice is to upgrade, and if you can't maybe download the model locally and try deleting the added_token.json file, it should work then

Sign up or log in to comment