Update README.md
Browse files
README.md
CHANGED
@@ -21,7 +21,7 @@ inference: false
|
|
21 |
|
22 |
These files are GPTQ 4bit model files for [Sambanova Systems' BLOOMChat 1.0](https://huggingface.co/sambanovasystems/BLOOMChat-176B-v1).
|
23 |
|
24 |
-
It is the result of quantising to
|
25 |
|
26 |
**This is a BIG model! 2 x 80GB or 3 x 48GB GPUs are required**
|
27 |
|
@@ -210,8 +210,6 @@ It was created with group_size none (-1) to reduce VRAM usage, and with --act-or
|
|
210 |
|
211 |
This will work with AutoGPTQ. It is untested with GPTQ-for-LLaMa. It will *not* work with ExLlama.
|
212 |
|
213 |
-
It was created with both group_size 128g and --act-order (desc_act) for increased inference quality.
|
214 |
-
|
215 |
It was created with both group_size 128g and --act-order (desc_act) for even higher inference accuracy, at the cost of increased VRAM usage. Because we already need 2 x 80GB or 3 x 48GB GPUs, I don't expect the increased VRAM usage to change the GPU requirements.
|
216 |
|
217 |
* `gptq_model-4bit-128g.safetensors`
|
|
|
21 |
|
22 |
These files are GPTQ 4bit model files for [Sambanova Systems' BLOOMChat 1.0](https://huggingface.co/sambanovasystems/BLOOMChat-176B-v1).
|
23 |
|
24 |
+
It is the result of quantising to 4-bit using [AutoGPTQ](https://github.com/PanQiWei/AutoGPTQ).
|
25 |
|
26 |
**This is a BIG model! 2 x 80GB or 3 x 48GB GPUs are required**
|
27 |
|
|
|
210 |
|
211 |
This will work with AutoGPTQ. It is untested with GPTQ-for-LLaMa. It will *not* work with ExLlama.
|
212 |
|
|
|
|
|
213 |
It was created with both group_size 128g and --act-order (desc_act) for even higher inference accuracy, at the cost of increased VRAM usage. Because we already need 2 x 80GB or 3 x 48GB GPUs, I don't expect the increased VRAM usage to change the GPU requirements.
|
214 |
|
215 |
* `gptq_model-4bit-128g.safetensors`
|