Add IQ Quantization support with the help of imatrix and GPUs

#35
by qnixsynapse - opened

It will allows us to create imatrix data and quants with one go!

Super useful when we deal with 100b+ models , 1m bit is really nice to support.

Would be awesome to see options for IQ 6 / 5 /4 /3 / 2 NL / XS

Would be really awesome to have an option to upload a txt for imatrix creation and then create imatrix quants with it.

#78 should help here if merged

ggml.ai org

We just merged support for iMatrix! Do let us know if you have any feedback! 🤗

@reach-vb Just gave it a try. I have one suggestion. Currently, it is impossible for anyone to see the progress because the gradio only shows the loading indicator. I think it would be better if console logs are shown instead. This will allow us to track the progress and inspect any errors encountered during calculation/conversion. :)

Thanks to you and everybody else involved. I should close this discussion now. :)

qnixsynapse changed discussion status to closed
ggml.ai org

That's a brilliant feedback!

@reach-vb Hi! When llama.cpp gets updated here?
Sorry for bothering you but currently Gemma(9B) conversation fails because of an assert, which has been fixed upstream.

Need atleast b3389 for the fix.

I was not sure how to contact you, so commented here. 😃

Sign up or log in to comment