Squish42/WizardLM-7B-Uncensored-GPTQ-act_order-8bit

ehartford/WizardLM-7B-Uncensored quantized to 8bit GPTQ with act order + true sequential, no group size.

For most uses this probably isn't what you want.
For 4bit with no act order or compatibility with old-cuda (text-generation-webui default) see TheBloke/WizardLM-7B-uncensored-GPTQ

Quantized using AutoGPTQ with the following config:

config: dict = dict(
    quantize_config=dict(bits=8, desc_act=True, true_sequential=True, model_file_base_name='WizardLM-7B-Uncensored'),
    use_safetensors=True
)

See quantize.py for the full script.

Tested for compatibility with:

WSL with GPTQ-for-Llama triton branch.
Windows with AutoGPTQ on cuda (triton deselected)

AutoGPTQ loader should read configuration from quantize_config.json.
For GPTQ-for-Llama use the following configuration when loading:
wbits: 8
groupsize: None
model_type: llama