Optimizing Mixtral-8x7B-Instruct-v0.1 for Hugging Face Chat

#54
by Husain - opened

What kind of optimizations are used to run MistralAI/Mixtral-8x7B-Instruct-v0.1 in Hugging Face Chat https://huggingface.co/chat ? Is this the default model in full precision?
Or are there optimizations to reduce memory requirements for running the model? like using float16 or (8-bit & 4-bit) using bitsandbytes
Is Flash Attention 2 is used too ?

Sign up or log in to comment