removing ARM quants

Browse files

Files changed (5) hide show

DeepSeek-V2.5-Q4_0_8_8/DeepSeek-V2.5-Q4_0_8_8-00001-of-00004.gguf +0 -3
DeepSeek-V2.5-Q4_0_8_8/DeepSeek-V2.5-Q4_0_8_8-00002-of-00004.gguf +0 -3
DeepSeek-V2.5-Q4_0_8_8/DeepSeek-V2.5-Q4_0_8_8-00003-of-00004.gguf +0 -3
DeepSeek-V2.5-Q4_0_8_8/DeepSeek-V2.5-Q4_0_8_8-00004-of-00004.gguf +0 -3
README.md +0 -7

DeepSeek-V2.5-Q4_0_8_8/DeepSeek-V2.5-Q4_0_8_8-00001-of-00004.gguf DELETED Viewed

@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:5eebd6e6dfa8f5770fe170397854f090a6196f127a8568c0ae6848c743a39868
-size 39658925312

DeepSeek-V2.5-Q4_0_8_8/DeepSeek-V2.5-Q4_0_8_8-00002-of-00004.gguf DELETED Viewed

@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:648bef998337f9868e04cd99414037370bd180d5f2f5beb840f80e691c4dd3cf
-size 39675124704

DeepSeek-V2.5-Q4_0_8_8/DeepSeek-V2.5-Q4_0_8_8-00003-of-00004.gguf DELETED Viewed

@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:53ca9a75c289504ca24cb1af67bd1a4c5b16ba79d7ba09b0cea89f8ca94db18b
-size 39447549440

DeepSeek-V2.5-Q4_0_8_8/DeepSeek-V2.5-Q4_0_8_8-00004-of-00004.gguf DELETED Viewed

@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:58cf99d2f5488a43247bc3ef5e1f543fabc304c9e15f61cc5bec616e7fce7971
-size 14130858304

README.md CHANGED Viewed

@@ -32,7 +32,6 @@ Run them in [LM Studio](https://lmstudio.ai/)
 | [DeepSeek-V2.5-Q5_K_M.gguf](https://huggingface.co/bartowski/DeepSeek-V2.5-GGUF/tree/main/DeepSeek-V2.5-Q5_K_M) | Q5_K_M | 167.22GB | true | High quality, *recommended*. |
 | [DeepSeek-V2.5-Q4_K_M.gguf](https://huggingface.co/bartowski/DeepSeek-V2.5-GGUF/tree/main/DeepSeek-V2.5-Q4_K_M) | Q4_K_M | 142.45GB | true | Good quality, default size for must use cases, *recommended*. |
 | [DeepSeek-V2.5-Q4_0.gguf](https://huggingface.co/bartowski/DeepSeek-V2.5-GGUF/tree/main/DeepSeek-V2.5-Q4_0) | Q4_0 | 133.39GB | true | Legacy format, generally not worth using over similarly sized formats |
-| [DeepSeek-V2.5-Q4_0_8_8.gguf](https://huggingface.co/bartowski/DeepSeek-V2.5-GGUF/tree/main/DeepSeek-V2.5-Q4_0_8_8) | Q4_0_8_8 | 132.91GB | true | Optimized for ARM inference. Requires 'sve' support (see link below). |
 | [DeepSeek-V2.5-IQ4_XS.gguf](https://huggingface.co/bartowski/DeepSeek-V2.5-GGUF/tree/main/DeepSeek-V2.5-IQ4_XS) | IQ4_XS | 125.56GB | true | Decent quality, smaller than Q4_K_S with similar performance, *recommended*. |
 | [DeepSeek-V2.5-Q3_K_XL.gguf](https://huggingface.co/bartowski/DeepSeek-V2.5-GGUF/tree/main/DeepSeek-V2.5-Q3_K_XL) | Q3_K_XL | 122.83GB | true | Uses Q8_0 for embed and output weights. Lower quality but usable, good for low RAM availability. |
 | [DeepSeek-V2.5-Q3_K_L.gguf](https://huggingface.co/bartowski/DeepSeek-V2.5-GGUF/tree/main/DeepSeek-V2.5-Q3_K_L) | Q3_K_L | 122.37GB | true | Lower quality but usable, good for low RAM availability. |
@@ -76,12 +75,6 @@ huggingface-cli download bartowski/DeepSeek-V2.5-GGUF --include "DeepSeek-V2.5-Q
 You can either specify a new local-dir (DeepSeek-V2.5-Q8_0) or download them all in place (./)
-## Q4_0_X_X
-If you're using an ARM chip, the Q4_0_X_X quants will have a substantial speedup. Check out Q4_0_4_4 speed comparisons [on the original pull request](https://github.com/ggerganov/llama.cpp/pull/5780#pullrequestreview-21657544660)
-To check which one would work best for your ARM chip, you can check [AArch64 SoC features](https://gpages.juszkiewicz.com.pl/arm-socs-table/arm-socs.html) (thanks EloyOn!).
 ## Which file should I choose?
 A great write up with charts showing various performances is provided by Artefact2 [here](https://gist.github.com/Artefact2/b5f810600771265fc1e39442288e8ec9)

 | [DeepSeek-V2.5-Q5_K_M.gguf](https://huggingface.co/bartowski/DeepSeek-V2.5-GGUF/tree/main/DeepSeek-V2.5-Q5_K_M) | Q5_K_M | 167.22GB | true | High quality, *recommended*. |
 | [DeepSeek-V2.5-Q4_K_M.gguf](https://huggingface.co/bartowski/DeepSeek-V2.5-GGUF/tree/main/DeepSeek-V2.5-Q4_K_M) | Q4_K_M | 142.45GB | true | Good quality, default size for must use cases, *recommended*. |
 | [DeepSeek-V2.5-Q4_0.gguf](https://huggingface.co/bartowski/DeepSeek-V2.5-GGUF/tree/main/DeepSeek-V2.5-Q4_0) | Q4_0 | 133.39GB | true | Legacy format, generally not worth using over similarly sized formats |
 | [DeepSeek-V2.5-IQ4_XS.gguf](https://huggingface.co/bartowski/DeepSeek-V2.5-GGUF/tree/main/DeepSeek-V2.5-IQ4_XS) | IQ4_XS | 125.56GB | true | Decent quality, smaller than Q4_K_S with similar performance, *recommended*. |
 | [DeepSeek-V2.5-Q3_K_XL.gguf](https://huggingface.co/bartowski/DeepSeek-V2.5-GGUF/tree/main/DeepSeek-V2.5-Q3_K_XL) | Q3_K_XL | 122.83GB | true | Uses Q8_0 for embed and output weights. Lower quality but usable, good for low RAM availability. |
 | [DeepSeek-V2.5-Q3_K_L.gguf](https://huggingface.co/bartowski/DeepSeek-V2.5-GGUF/tree/main/DeepSeek-V2.5-Q3_K_L) | Q3_K_L | 122.37GB | true | Lower quality but usable, good for low RAM availability. |
 You can either specify a new local-dir (DeepSeek-V2.5-Q8_0) or download them all in place (./)
 ## Which file should I choose?
 A great write up with charts showing various performances is provided by Artefact2 [here](https://gist.github.com/Artefact2/b5f810600771265fc1e39442288e8ec9)