What is the context length in training?
#1
by
xuxiu
- opened
I am reproducing this model using the 0625 dataset and qwen2-7B, I want to know what is the context length you use in training. I saw that you concat the samples in one sequence. I am doing the same, but I don't know what exactly the length you use.
Hi, we set it to 4096.