nvidia/Nemotron-Mini-4B-Instruct · Minor issues with the chat template during fine-tuning

I attempted to do LoRA fine-tuning of this model and noticed some minor issues with the chat template that I had to work around.

For creating training samples, I did:

text = tokenizer.apply_chat_template(
    [
        system_role,
        {"role": "user", "content": question},
        {"role": "assistant", "content": output},
    ],
    tokenize=False,
    add_generation_prompt=False,
)

Unexpectedly I noticed an extra <extra_id_1>Assistantbeing added to the end. It's as if the logic thought I had set add_generation_prompt=True. This caused the trained model model to keep outputting <extra_id_1>Assistant after every line.

I assume instead of <extra_id_1>Assistantit should have appended the end of sentence token instead? (i.e. </s>).

Looking at the jinja template, it wasn't immediately obvious to me what happened. At the end of this line, there is an extra {{'<extra_id_1>Assistant\n'}} which may be the culprit.

Let me know if I can provide more details.