ZeroWw
/

gemma-2-2b-it-GGUF

Text Generation

Inference Endpoints

Model card Files Files and versions Community

More explanation on performance?

#1

by sdalemorrey - opened Aug 2

Aug 2

On the model card you say, "both f16.q6 and f16.q5 are smaller than q8_0 standard quantization and they perform as well as the pure f16."

A couple of things, I can't see any f16.q6 or f16.q5 in the repo. Are those coming soon?
Also can you explain the performance, or at least elaborate a bit more? Are we talking benchmarks, speed, etc?

Great job by the way and thanks!

ZeroWw

Owner Aug 2

On the model card you say, "both f16.q6 and f16.q5 are smaller than q8_0 standard quantization and they perform as well as the pure f16."

A couple of things, I can't see any f16.q6 or f16.q5 in the repo. Are those coming soon?
Also can you explain the performance, or at least elaborate a bit more? Are we talking benchmarks, speed, etc?

Great job by the way and thanks!

Sorry for the confusion (lol I sounded like an LLM): the files in this repo are all with f16 output and embed tensors. (except the q8_p).

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment