There are doubts about the settings of eos_token, bos_token, pad_token

by cl-modelcloud - opened Aug 15

Aug 15

The three token settings in the tokenizer_config.json file are as follows,

but in the config.json file,

"bos_token_id": 0,
"eos_token_id": 11,
"pad_token_id": 0,

These three token_ids correspond to
"bos_token_id": ">>TITLE<<",
"eos_token_id": "<|end_of_text|>",
"pad_token_id": ">>TITLE<<",

Which setting is correct?

Technology Innovation Institute org Aug 15

Hi,
Thanks for spotting this ambiguity
It has been corrected now

Gkunsch changed discussion status to closed Aug 15

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment