What is the context length in training?

#1
by xuxiu - opened

I am reproducing this model using the 0625 dataset and qwen2-7B, I want to know what is the context length you use in training. I saw that you concat the samples in one sequence. I am doing the same, but I don't know what exactly the length you use.

Beijing Academy of Artificial Intelligence org

Hi, we set it to 4096.

Sign up or log in to comment