Update tokenizer_config.json
#12
by
freaksamael
- opened
No description provided.
For English is not need for Chinese chrs; also, if possible, it will be great if the size is increased; in general we need to tokenize some relevant text in 1024 window. Thank you!
Hi, thanks for your interest.
The max length is 512 during the training, so it cannot process the sequence whose length is larger than 512. Actually, it only uses the first 512 tokens and ignores other tokens.
Therefore, increasing the size of max length has no impact, and the model still only uses the first 512 tokens.