TheBloke
/

BLOOMChat-176B-v1-GPTQ

Text Generation

Model card Files Files and versions Community

TheBloke commited on Jul 7, 2023

Commit

5fbe47e

•

1 Parent(s): a4d7e9c

Update README.md

Files changed (1) hide show

README.md +1 -3

README.md CHANGED Viewed

@@ -21,7 +21,7 @@ inference: false
 These files are GPTQ 4bit model files for [Sambanova Systems' BLOOMChat 1.0](https://huggingface.co/sambanovasystems/BLOOMChat-176B-v1).
-It is the result of quantising to 4bit using [AutoGPTQ](https://github.com/PanQiWei/AutoGPTQ).
 **This is a BIG model! 2 x 80GB or 3 x 48GB GPUs are required**
@@ -210,8 +210,6 @@ It was created with group_size none (-1) to reduce VRAM usage, and with --act-or
 This will work with AutoGPTQ. It is untested with GPTQ-for-LLaMa.  It will *not* work with ExLlama.
-It was created with both group_size 128g and --act-order (desc_act) for increased inference quality.
 It was created with both group_size 128g and --act-order (desc_act) for even higher inference accuracy, at the cost of increased VRAM usage. Because we already need 2 x 80GB or 3 x 48GB GPUs, I don't expect the increased VRAM usage to change the GPU requirements.
 * `gptq_model-4bit-128g.safetensors`

 These files are GPTQ 4bit model files for [Sambanova Systems' BLOOMChat 1.0](https://huggingface.co/sambanovasystems/BLOOMChat-176B-v1).
+It is the result of quantising to 4-bit using [AutoGPTQ](https://github.com/PanQiWei/AutoGPTQ).
 **This is a BIG model! 2 x 80GB or 3 x 48GB GPUs are required**
 This will work with AutoGPTQ. It is untested with GPTQ-for-LLaMa.  It will *not* work with ExLlama.
 It was created with both group_size 128g and --act-order (desc_act) for even higher inference accuracy, at the cost of increased VRAM usage. Because we already need 2 x 80GB or 3 x 48GB GPUs, I don't expect the increased VRAM usage to change the GPU requirements.
 * `gptq_model-4bit-128g.safetensors`