mistralai/Mixtral-8x7B-Instruct-v0.1 · Trainer RuntimeError: The size of tensor a (32) must match the size of tensor b (8) at non-singleton dimension 0

Hi everyone,

I'm currently fine-tuning the Mixtral 8x7B model and encountered an issue with the quantized version at 8 bits. During training, I consistently run into an error with the 8-bit quantized model. Interestingly, the same process works perfectly fine when I use the 4-bit quantized version.

I've attached a screenshot of the error message for reference. Has anyone experienced something similar or have any suggestions on how to resolve this? Any help would be greatly appreciated!

Thank you!