Error while loading Mixtral-8x7B-Instruct-v0.1 tokenizer with AutoTokenizer.from_pretrained

#229
by Amaia-C - opened

Since yesterday, when trying to load Mixtral-8x7B-Instruct-v0.1's tokenizer with

from transformers import AutoModelForCausalLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("mistralai/Mixtral-8x7B-Instruct-v0.1", token=MY_TOKEN)

I get the following error : "Exception: data did not match any variant of untagged enum PyPreTokenizerTypeWrapper at line 40 column 3"

These 2 loading lines used to work before yesterday and they still work for the 8x22 version of the model (when loading "mistralai/Mixtral-8x22B-Instruct-v0.1" tokenizer).

edit: I guess this can be linked to "Align tokenizer with mistral-common (#225)". Should we completely stop trying to use huggingface tokenizer for Mixtral-8x7B-Instruct-v0.1 then ?

Mistral AI_ org

Since yesterday, when trying to load Mixtral-8x7B-Instruct-v0.1's tokenizer with

from transformers import AutoModelForCausalLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("mistralai/Mixtral-8x7B-Instruct-v0.1", token=MY_TOKEN)

I get the following error : "Exception: data did not match any variant of untagged enum PyPreTokenizerTypeWrapper at line 40 column 3"

These 2 loading lines used to work before yesterday and they still work for the 8x22 version of the model (when loading "mistralai/Mixtral-8x22B-Instruct-v0.1" tokenizer).

edit: I guess this can be linked to "Align tokenizer with mistral-common (#225)". Should we completely stop trying to use huggingface tokenizer for Mixtral-8x7B-Instruct-v0.1 then ?

Hi Amaia, you need to update your transformers, there were updates on how tokenizers are handled now in general, take a look at: https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2/discussions/143#66867c5d9be89f9070d5d6f7

Sign up or log in to comment