More quants?

by MoonRide - opened Apr 23

Apr 23

Could you please provide more quants, like also Q6_K, Q5_K_M? Those are better quality than Q4 (sample results in https://github.com/ggerganov/llama.cpp/blob/master/examples/perplexity/README.md).

MoonRide

Apr 23

For anyone interested - quants created from F16 via:
quantize Phi-3-mini-4k-instruct-fp16.gguf Phi-3-mini-4k-instruct-Q6_K.gguf Q6_K
work fine (tested using llama.cpp b2714).

simonw

Apr 24

What quantize command are you using there?

simonw

Apr 24

Figured it out, it was this one: https://github.com/ggerganov/llama.cpp/blob/master/examples/quantize/quantize.cpp

MoonRide

Apr 24

@simonw It's just a standard exacutable included in llama.cpp I've mentioned earlier (you can just check out binary releases of llama.cpp, published at https://github.com/ggerganov/llama.cpp/releases).

gugarosa changed discussion status to closed May 1

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment