Error with Tokenizer
Hello,
I'm currently fine-tuning the "Mistral-7B-Instruct-v0.1" model and I've encountered an issue that I haven't faced before when using the AutoTokenizer from Transformers. Here's the code I'm using:
tokenizer = AutoTokenizer.from_pretrained( base_model_id, padding_side="left", # reduces memory usage add_eos_token=True, add_bos_token=True, ) tokenizer.pad_token = tokenizer.eos_token
However, I'm receiving the following error:
OSError: Can't load tokenizer for 'mistralai/Mistral-7B-Instruct-v0.1'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure 'mistralai/Mistral-7B-Instruct-v0.1' is the correct path to a directory containing all relevant files for a LlamaTokenizerFast tokenizer.
Does anyone know how to resolve this issue?
I am facing the same issue, Have you found any solution?
I'm not sure if it's relevant, but I'm temporarily utilizing the "Mistral-7B-v0.1" tokenizer until a solution is found. Please keep me informed if there are any updates.
Hi - I have the same error but using flash_attn==2.5.8 gets rid of the tokenizer error but creates a new import module error for downloading models.
Requirements to reproduce:
flash_attn==2.5.8
transformers==4.41.2
torch==2.2.2
requests==2.31.0
mlflow==2.13.1
bitsandbytes==0.42.0
accelerate==0.31.0
databricks 14.3 ML cluster with cuda version 11.8
Has anyone got a fix?
This isn't a library error. I was facing the same issue until I realized I hadn't logged in to Hugging Face:
from huggingface_hub import login
login(token="your_access_token_here")
I'm trying to deploy the model on AKS cluster by adding the env variable 'HF_TOKEN' to the mistral-7b.yaml but still getting an error '401 Client Error: Unauthorized for url: https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1/resolve/main/adapter_config.json'. Any advise on this? Thanks