Update README.md
Browse files
README.md
CHANGED
@@ -16,7 +16,7 @@ These are here for reference, comparison, and any future work.
|
|
16 |
|
17 |
The quality of the llamafiles generated from these freshly converted GGUFs were noticeably better than those generated from the other GGUFs on HF.
|
18 |
|
19 |
-
|
20 |
- q3-k-m: can fit entirely on a 4090 (24GB VRAM), very fast inference
|
21 |
- q4-0: for some reason, this is better quality than q4-k-m.
|
22 |
- q4-k-m: the widely accepted standard as "good enough" and general favorite for most models, but in this case it does not fit on a 4090
|
|
|
16 |
|
17 |
The quality of the llamafiles generated from these freshly converted GGUFs were noticeably better than those generated from the other GGUFs on HF.
|
18 |
|
19 |
+
quant file notes:
|
20 |
- q3-k-m: can fit entirely on a 4090 (24GB VRAM), very fast inference
|
21 |
- q4-0: for some reason, this is better quality than q4-k-m.
|
22 |
- q4-k-m: the widely accepted standard as "good enough" and general favorite for most models, but in this case it does not fit on a 4090
|