Why there are not only one token_Id for some tokens?
#6
by
Sev777
- opened
# sample code to repro the bug
>>> tokenizer = LlamaTokenizer.from_pretrained(‘huggingface/open_llama_7b’)
>>> tokenizer.encode('London')
[1, 2516]
>>> tokenizer.decode(2516)
'London'
>>> tokenizer.decode(20719)
'London'
>>> tokenizer.decode(2516)==tokenizer.decode(20719)
True