Tokenizer fix

by justinbarton - opened Apr 2

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

-847

justinbarton

Apr 2

No description provided.

Fixing tokenizer error on loadd59a7617

justinbarton

Apr 2

Presently loading the tokenizer via:

tokeniser = T5Tokenizer.from_pretrained("Exscientia/IgT5", do_lower_case=False)

Yields the following error:

ValueError: Non-consecutive added token '<extra_id_99>' found. Should have index 128 but has index 28 in saved vocabulary.

This PR should resolve the issue.

justinbarton changed pull request status to open Apr 2

exs-hkenlay

Exscientia org Apr 5

Hi @justinbarton , thank you for the interest in our work! What version of transformers are you using? I tried this line in a colab notebook with both the transformers version we developed in (4.35.2) as well as the latest version (4.39.3) and they both imported the tokeniser without any errors.

justinbarton

Apr 15

•

edited Apr 15

How odd. I was using 4.30.2.

exs-fdreyer changed pull request status to closed Apr 21

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment