Text Generation
Transformers
Safetensors
English
falcon_mamba
conversational
Inference Endpoints

There are doubts about the settings of eos_token, bos_token, pad_token

#7
by cl-modelcloud - opened

The three token settings in the tokenizer_config.json file are as follows,

"eos_token": "<|end_of_text|>",
"bos_token": "<|begin_of_text|>",
"pad_token": "<|end_of_text|>",

but in the config.json file,

"bos_token_id": 0,
"eos_token_id": 11,
"pad_token_id": 0,

These three token_ids correspond to
"bos_token_id": ">>TITLE<<",
"eos_token_id": "<|end_of_text|>",
"pad_token_id": ">>TITLE<<",

Which setting is correct?

Technology Innovation Institute org

Hi,
Thanks for spotting this ambiguity
It has been corrected now

Gkunsch changed discussion status to closed

Sign up or log in to comment