Text Generation
Transformers
PyTorch
English
llama
causal-lm
text-generation-inference
Inference Endpoints

Can someone share ggml 4bit version?

#2
by alphaprime90 - opened

For CPU users

I have provided them here: https://huggingface.co/TheBloke/stable-vicuna-13B-GGML

I also have 4bit GPTQs for lower-VRAM GPU inference here: https://huggingface.co/TheBloke/stable-vicuna-13B-GPTQ

Legend. Thank you.

alphaprime90 changed discussion status to closed

Sign up or log in to comment