wolfram/miquliz-120b-v2.0

I'll consider it. Problem with AWQ/GPTQ is that the format is less compatible/flexible than GGUF/EXL2 where you can find a quant in exactly the size to work with your VRAM resources.

In production, I use vLLM (the excellent aphrodite-engine fork) for fast parallel inference, but since I only have 48 GB VRAM on my systems, for Miquliz 120B I use EXL2 with Exllamav2 or the new 2-bit GGUF imatrix quants with llama.cpp/KoboldCpp. So don't think AWQ/GPTQ is a good fit for 120B models as of now, and it would take a huge amount of my limited resources. (I miss TheBloke, too!)

wolfram
/

miquliz-120b-v2.0

GPTQ / AWQ