Transformer bitsandbytes 4 bit quant does not work well. QLoRA also fails

by Yhyu13 - opened Jan 21

Jan 21

HI,

I am not sure if you have tried transformer inference or not, but it seems internlm2 would not work properly under bitsandbytes 4 bit quantization. It will constantly spit out self Q&As without stopping.

Also accerlerate QLoRA would also not work on internlm2 with error on some tensor has no grad.

float16 would work in all cases.

It is not a big deal since internlm2 mainly support lmdeploy framework with its own 4 bit quantization instead of transformer.

Thanks!

ZwwWayne

InternLM org Jan 24

Hi @Yhyu13 ,
Could you please refer to this doc https://github.com/InternLM/InternLM/pull/636 and see if it solves your issue?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment