Tokenizer fix
#1
by
justinbarton
- opened
No description provided.
Presently loading the tokenizer via:
tokeniser = T5Tokenizer.from_pretrained("Exscientia/IgT5", do_lower_case=False)
Yields the following error:
ValueError: Non-consecutive added token '<extra_id_99>' found. Should have index 128 but has index 28 in saved vocabulary.
This PR should resolve the issue.
justinbarton
changed pull request status to
open
Hi
@justinbarton
, thank you for the interest in our work! What version of transformers
are you using? I tried this line in a colab notebook with both the transformers
version we developed in (4.35.2
) as well as the latest version (4.39.3
) and they both imported the tokeniser
without any errors.
How odd. I was using 4.30.2
.
exs-fdreyer
changed pull request status to
closed