Transformer bitsandbytes 4 bit quant does not work well. QLoRA also fails
#4
by
Yhyu13
- opened
HI,
I am not sure if you have tried transformer inference or not, but it seems internlm2 would not work properly under bitsandbytes 4 bit quantization. It will constantly spit out self Q&As without stopping.
Also accerlerate QLoRA would also not work on internlm2 with error on some tensor has no grad.
float16 would work in all cases.
It is not a big deal since internlm2 mainly support lmdeploy framework with its own 4 bit quantization instead of transformer.
Thanks!
Hi
@Yhyu13
,
Could you please refer to this doc https://github.com/InternLM/InternLM/pull/636 and see if it solves your issue?