Adds the tokenizer configuration file
#3
by
lysandre
HF staff
- opened
The tokenizer configuration file is missing/incorrect and therefore leading to unforeseen errors after the migration of the canonical models.
Refer to the following issue for more information: transformers#29050
The current failing code is the following:
from transformers import AutoTokenizer
>>> previous_tokenizer = AutoTokenizer.from_pretrained("xlnet-large-cased")
>>> current_tokenizer = AutoTokenizer.from_pretrained("xlnet/xlnet-large-cased")
>>> print(previous_tokenizer.model_max_length, current_tokenizer.model_max_length)
1000000000000000019884624838656, 1000000000000000019884624838656
This is the result after the fix:
from transformers import AutoTokenizer
>>> previous_tokenizer = AutoTokenizer.from_pretrained("xlnet-large-cased")
>>> current_tokenizer = AutoTokenizer.from_pretrained("xlnet/xlnet-large-cased")
>>> print(previous_tokenizer.model_max_length, current_tokenizer.model_max_length)
1000000000000000019884624838656, 1000000000000000019884624838656