Update tokenizer_config.json

#12

by freaksamael - opened Aug 27, 2023

base: refs/heads/main

←

from: refs/pr/12

Discussion Files changed

-2

freaksamael

Aug 27, 2023

No description provided.

Update tokenizer_config.jsonf52c5d3c

freaksamael

Aug 27, 2023

For English is not need for Chinese chrs; also, if possible, it will be great if the size is increased; in general we need to tokenize some relevant text in 1024 window. Thank you!

Shitao

Beijing Academy of Artificial Intelligence org Aug 28, 2023

Hi, thanks for your interest.
The max length is 512 during the training, so it cannot process the sequence whose length is larger than 512. Actually, it only uses the first 512 tokens and ignores other tokens.
Therefore, increasing the size of max length has no impact, and the model still only uses the first 512 tokens.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Ready to merge

This branch is ready to get merged automatically.

· Sign up or log in to comment