I want to make the chatbot using Mixtral-8x7B-Instruct-v0.1 model, but the inference speed is very slow, so I cannot use it as a chatbot. How can I fix this issue?
#211
by
rising620
- opened
I have already done many things such as flash attention, model quantization to optimize the inference speed, but still got slow influence speed as a chatbot.
Please help me how can I speed up this model.
Hi, did you resolve the issue. I am also facing same issue can you help me please