4bit GPTQ model available for anyone interested

#2
by TheBloke - opened

I've done a 4bit GPTQ conversion of this model, which is available here: https://huggingface.co/TheBloke/GPT4All-13B-snoozy-GPTQ

Nomic AI org

awesome thanks so much!!

@TheBloke is it possible to add your own additional training to "gpt4all-13b-snoozy"? So lets say I have my own data I want to train and give a heavier weight and merge back with this, is that possible?

@TheBloke is it possible to add your own additional training to "gpt4all-13b-snoozy"? So lets say I have my own data I want to train and give a heavier weight and merge back with this, is that possible?

Yes it is possible to apply additional training on top of a trained model like this.

There are four methods:

  1. Get the unquantised model from this repo, apply a new full training on top of it - ie similar to what GPT4All did to train this model in the first place, but using their model as the base instead of raw Llama;
  2. Get the unquantised model from this repo, apply a LoRA fine tuning;
  3. Get the unquantised model from this repo, apply a QLoRA fine tuning;
  4. Get the quantised GPTQ model from my repo, apply a LoRA training using the new AutoGPTQ PEFT code added in version 0.3.0 - not yet released, so need to compile from source.

Method 1 is arguably the ideal method in terms of quality of training, but also by far the most expensive. To do that you'd need 4 x A100 40GB, or similar hardware.

Method 2 could be done on a single A100 80GB, or probably a single A6000 48GB.

Method 3 could be done on a consumer GPU, like a 24GB 3090 or 4090, or possibly even a 16GB GPU.

Method 4 could also be done on a consumer GPU and may be a bit faster than method 3. It has the advantage that you don't need to download the full 26GB base model, but only the 4bit GPTQ. I don't know how quality compares to method 3.

Generally speaking I believe methods 2, 3 and 4 will all have a similar training quality. Lower than method 1, but definitely acceptable.

A lot of people are using QLoRA now, since it came out a few weeks ago. If you browse Hugging Face Hub a lot you may well have seen mention of QLoRA in a number of repos. There's a number of tutorial videos on it, like this one I can recommend: https://youtu.be/8vmWGX1nfNM . And here's a blog post tutorial on it: https://animal-machine.com/posts/fine-tuning-llama-models-with-qlora-and-axolotl/

AutoGPTQ PEFT is brand new and not many people are using it yet, but it just got integrated into text-generation-webui so it's easy to try it in that UI.

Sign up or log in to comment