anthracite-org
/

magnum-v2-4b-gguf

Text Generation

Inference Endpoints

Model card Files Files and versions Community

lucyknada commited on Aug 27

Commit

9846e95

•

1 Parent(s): 5c5b217

Update README.md

Files changed (1) hide show

README.md +7 -1

README.md CHANGED Viewed

@@ -35,6 +35,10 @@ Can I ask a question?<|im_end|>
 ## Support
 To run inference on this model, you'll need to use Aphrodite, vLLM or EXL2/tabbyAPI, as llama.cpp hasn't yet merged the required pull request to fix the llama3.1 rope_freqs issue with custom head dimensions.
 However, you can work around this by quantizing the model yourself to create a functional GGUF file. Note that until [this PR](https://github.com/ggerganov/llama.cpp/pull/9141) is merged, the context will be limited to 8k tokens.
@@ -44,7 +48,9 @@ To create a working GGUF file, make the following adjustments:
 1. Remove the `"rope_scaling": {}` entry from `config.json`
 2. Change `"max_position_embeddings"` to `8192` in `config.json`
-These modifications should allow you to use the model with llama.cpp, albeit with the mentioned context limitation.
 ## axolotl config

 ## Support
+Upstream support has been merged, so these quants work out of the box now!
+<details><summary>old instructions before PR</summary>
 To run inference on this model, you'll need to use Aphrodite, vLLM or EXL2/tabbyAPI, as llama.cpp hasn't yet merged the required pull request to fix the llama3.1 rope_freqs issue with custom head dimensions.
 However, you can work around this by quantizing the model yourself to create a functional GGUF file. Note that until [this PR](https://github.com/ggerganov/llama.cpp/pull/9141) is merged, the context will be limited to 8k tokens.
 1. Remove the `"rope_scaling": {}` entry from `config.json`
 2. Change `"max_position_embeddings"` to `8192` in `config.json`
+These modifications should allow you to use the model with llama.cpp, albeit with the mentioned context limitation.</strike>
+</details><br>
 ## axolotl config