tsumeone/stable-vicuna-13B-4bit-128g-cuda

Quantized version of this: https://huggingface.co/TheBloke/stable-vicuna-13B-HF

Big thank you to TheBloke for uploading the HF version above. Unfortunately, his GPTQ quant doesn't run on 0cc4m's fork of KAI/GPTQ so I am uploading one that does.

GPTQ quantization using https://github.com/0cc4m/GPTQ-for-LLaMa for compatibility with 0cc4m's fork of KoboldAI.

Command used to quantize:
python llama.py c:\stable-vicuna-13B-HF c4 --wbits 4 --true-sequential --groupsize 128 --save_safetensors 4bit-128g.safetensors

This model works best with the following prompting. Also, it really does not like to stop on its own and will likely keep going on forever if you let it.

### Human:
What is 2+2?

### Assistant: