ehartford/WizardLM-7B-Uncensored quantized to 8bit GPTQ with act order + true sequential, no group size.
For most uses this probably isn't what you want.
For 4bit with no act order or compatibility with old-cuda
(text-generation-webui default) see TheBloke/WizardLM-7B-uncensored-GPTQ
Quantized using AutoGPTQ with the following config:
config: dict = dict(
quantize_config=dict(bits=8, desc_act=True, true_sequential=True, model_file_base_name='WizardLM-7B-Uncensored'),
use_safetensors=True
)
See quantize.py
for the full script.
Tested for compatibility with:
- WSL with GPTQ-for-Llama
triton
branch. - Windows with AutoGPTQ on
cuda
(triton deselected)
AutoGPTQ loader should read configuration from quantize_config.json
.
For GPTQ-for-Llama use the following configuration when loading:
wbits: 8
groupsize: None
model_type: llama
- Downloads last month
- 24
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.