Gemma 7B Instruct GGUF

Contains Q4 & Q8 quantized GGUFs for google/gemma

Perf

Variant	Device	Perf
Q4	RTX 2070S	22 tok/s
	M1 Pro 10-core GPU	28 tok/s
Q8	RTX 2070S	7 tok/s (could only offload 23/29 layers to GPU)
	M1 Pro 10-core GPU	17 tok/s

GGUF

Model size

8.54B params

Architecture

gemma

4-bit

8-bit

Inference API

Unable to determine this model's library. Check the docs .