Can you post the tokenizers?

by jackboot - opened Jul 12

Jul 12

I'd like to use this with llama.cpp HF and currently I cannot. I can manually switch the config to chatml but I have no idea if you assigned those tokens a particular value or if they're being split apart. They aren't very big.

gghfez

Owner Jul 12

I've uploaded the tokenizer.json

jackboot

Jul 13

•

edited Jul 13

Thanks, there's also configs that go with it. I suppose that at least I can grind out the jsons this way.

heh, looking at your GGUF, it has incorrect metadata and still uses as eos token. end_of_turn is also set as the EOT.

gghfez

Owner Jul 15

You're right, this one's pretty broken. I've created a V2 here (including tokenizer and tokenizer_config:

https://huggingface.co/gghfez/gemma-2-27b-rp-c2-v2-GGUF

This one was trained with the "gemma2 chatml" template
{{ bos_token }}{% for message in mess...

Still has the issue with tags not tokenized.

Working for me in SillyTavern with the gemma2 and chatml

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment