GPTQ / AWQ
#2
by
agahebr
- opened
Hi!
I was wondering if you're planning on adding AWQ/GPTQ support for this model? I'd usually check TheBloke's page, but he seems to be AFK as of recent.
I'll consider it. Problem with AWQ/GPTQ is that the format is less compatible/flexible than GGUF/EXL2 where you can find a quant in exactly the size to work with your VRAM resources.
In production, I use vLLM (the excellent aphrodite-engine fork) for fast parallel inference, but since I only have 48 GB VRAM on my systems, for Miquliz 120B I use EXL2 with Exllamav2 or the new 2-bit GGUF imatrix quants with llama.cpp/KoboldCpp. So don't think AWQ/GPTQ is a good fit for 120B models as of now, and it would take a huge amount of my limited resources. (I miss TheBloke, too!)