Arki05 commited on
Commit
62c6036
1 Parent(s): e6c7fa9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +16 -7
README.md CHANGED
@@ -3,21 +3,30 @@ license: apache-2.0
3
  ---
4
  # Grok-1 GGUF Quantizations
5
 
6
- > [!WARNING]
7
- > As discovered by [@DgDev91](https://huggingface.co/Arki05/Grok-1-GGUF/discussions/8) there's a slight issue with file naming when using these Quant's with current llama.cpp.
8
- >
9
- > A fix is already provided by @phymbert in [#6192](https://github.com/ggerganov/llama.cpp/pull/6192).
10
- >
11
- > For ease of use i've created a branch ([Quick-Fix Branch](https://github.com/arki05/llama.cpp-grok/tree/quick-fix-grok-split)) that incorporates these fixes.
12
-
13
  This repository contains unofficial GGUF Quantizations of Grok-1, compatible with `llama.cpp` as of [PR- Add grok-1 support #6204](https://github.com/ggerganov/llama.cpp/pull/6204).
14
 
15
  ## Updates
16
 
 
17
  - The splits have been updated to utilize the improvements from [PR: llama_model_loader: support multiple split/shard GGUFs](https://github.com/ggerganov/llama.cpp/pull/6187). As a result, manual merging with `gguf-split` is no longer required.
18
 
19
  With this, there is no need to merge the split files before use. Just download all splits and run llama.cpp with the first split like you would previously. It'll detect the other splits and load them as well.
20
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
21
 
22
  ## Available Quantizations
23
 
 
3
  ---
4
  # Grok-1 GGUF Quantizations
5
 
 
 
 
 
 
 
 
6
  This repository contains unofficial GGUF Quantizations of Grok-1, compatible with `llama.cpp` as of [PR- Add grok-1 support #6204](https://github.com/ggerganov/llama.cpp/pull/6204).
7
 
8
  ## Updates
9
 
10
+ #### Native Split Support in llama.cpp
11
  - The splits have been updated to utilize the improvements from [PR: llama_model_loader: support multiple split/shard GGUFs](https://github.com/ggerganov/llama.cpp/pull/6187). As a result, manual merging with `gguf-split` is no longer required.
12
 
13
  With this, there is no need to merge the split files before use. Just download all splits and run llama.cpp with the first split like you would previously. It'll detect the other splits and load them as well.
14
 
15
+ #### Direct Split Download from huggingface using llama.cpp
16
+ - Thanks to a new PR [common: llama_load_model_from_url split support #6192](https://github.com/ggerganov/llama.cpp/pull/6192) from phymbert it's now possible load model splits from url.
17
+
18
+ That means this downloads and runs the model:
19
+
20
+ ```
21
+ server \
22
+ --hf-repo Arki05/Grok-1-GGUF \
23
+ --hf-file grok-1-IQ3_XS-split-00001-of-00009.gguf \
24
+ --model models/grok-1-IQ3_XS-split-00001-of-00009.gguf \
25
+ -ngl 999
26
+ ```
27
+
28
+ And that is very cool (@phymbert)
29
+
30
 
31
  ## Available Quantizations
32