Update README.md
Browse files
README.md
CHANGED
@@ -48,27 +48,27 @@ Open the text-generation-webui UI as normal.
|
|
48 |
5. Click the **Refresh** icon next to **Model** in the top left.
|
49 |
6. In the **Model drop-down**: choose the model you just downloaded, `Samantha-33B-GPTQ`.
|
50 |
7. If you see an error in the bottom right, ignore it - it's temporary.
|
51 |
-
8. Fill out the `GPTQ parameters` on the right: `Bits = 4`, `Groupsize =
|
52 |
9. Click **Save settings for this model** in the top right.
|
53 |
10. Click **Reload the Model** in the top right.
|
54 |
11. Once it says it's loaded, click the **Text Generation tab** and enter a prompt!
|
55 |
|
56 |
## Provided files
|
57 |
|
58 |
-
**Samantha-33B-GPTQ-4bit
|
59 |
|
60 |
This will work with all versions of GPTQ-for-LLaMa. It has maximum compatibility.
|
61 |
|
62 |
-
It was created with groupsize
|
63 |
|
64 |
-
* `Samantha-33B-GPTQ-4bit
|
65 |
* Works with all versions of GPTQ-for-LLaMa code, both Triton and CUDA branches
|
66 |
* Works with AutoGPTQ
|
67 |
* Works with text-generation-webui one-click-installers
|
68 |
-
* Parameters: Groupsize =
|
69 |
* Command used to create the GPTQ:
|
70 |
```
|
71 |
-
python llama.py /workspace/process/samantha-33B/HF wikitext2 --wbits 4 --true-sequential --
|
72 |
```
|
73 |
## Want to support my work?
|
74 |
|
|
|
48 |
5. Click the **Refresh** icon next to **Model** in the top left.
|
49 |
6. In the **Model drop-down**: choose the model you just downloaded, `Samantha-33B-GPTQ`.
|
50 |
7. If you see an error in the bottom right, ignore it - it's temporary.
|
51 |
+
8. Fill out the `GPTQ parameters` on the right: `Bits = 4`, `Groupsize = None`, `model_type = Llama`
|
52 |
9. Click **Save settings for this model** in the top right.
|
53 |
10. Click **Reload the Model** in the top right.
|
54 |
11. Once it says it's loaded, click the **Text Generation tab** and enter a prompt!
|
55 |
|
56 |
## Provided files
|
57 |
|
58 |
+
**Samantha-33B-GPTQ-4bit.act-order.safetensors**
|
59 |
|
60 |
This will work with all versions of GPTQ-for-LLaMa. It has maximum compatibility.
|
61 |
|
62 |
+
It was created with no groupsize to minimise VRAM requirements, and with act-order to ensure highest possible inference quality.
|
63 |
|
64 |
+
* `Samantha-33B-GPTQ-4bit.act-order.safetensors`
|
65 |
* Works with all versions of GPTQ-for-LLaMa code, both Triton and CUDA branches
|
66 |
* Works with AutoGPTQ
|
67 |
* Works with text-generation-webui one-click-installers
|
68 |
+
* Parameters: Groupsize = None. Act-order.
|
69 |
* Command used to create the GPTQ:
|
70 |
```
|
71 |
+
python llama.py /workspace/process/samantha-33B/HF wikitext2 --wbits 4 --true-sequential --act-order --save_safetensors /workspace/process/Samantha-33B-GPTQ-4bit.act-order.safetensors
|
72 |
```
|
73 |
## Want to support my work?
|
74 |
|