GGUFs
Collection
Collection of usable GGUFs for running LLMs on the edge or consumer devices like phones & laptops!
•
3 items
•
Updated
•
1
Contains Q4 & Q8 quantized GGUFs for google/gemma
Variant | Device | Perf |
---|---|---|
Q4 | RTX 2070S | 22 tok/s |
M1 Pro 10-core GPU | 28 tok/s | |
Q8 | RTX 2070S | 7 tok/s (could only offload 23/29 layers to GPU) |
M1 Pro 10-core GPU | 17 tok/s |