Cuda supports all the implementations of GPTQ now
#1
by
TheYuriLover
- opened
Hello,
CUDA_VISIBLE_DEVICES=0 python llama.py ./models/chavinlo-gpt4-x-alpaca --wbits 4 --true-sequential --groupsize 128 --save gpt-x-alpaca-13b-native-4bit-128g-cuda.pt
I saw that your CUDA quantization doesn't have act-order in it, you should quantize it again because it looks like qwopqwop200 finally combined all the implementations on it.
https://github.com/qwopqwop200/GPTQ-for-LLaMa/tree/cuda