Add GPTQ-loading with π€ transformers lib
#3
by
marksverdhei
- opened
This pull request converts the model files to make it runnable with the π€ transformers
library directly, by converting the files to the same format as TheBloke models.
This includes:
- Adding the quantization config to
config.json
- Adding metadata to model.safetensors:
{"format": "pt", "quantized_by": "RuterNorway"}
Then, given that the librariestransformers, optimum, auto-gptq
you should be able to load it in like this:
# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("RuterNorway/Llama-2-13b-chat-norwegian-GPTQ")
model = AutoModelForCausalLM.from_pretrained("RuterNorway/Llama-2-13b-chat-norwegian-GPTQ")
marksverdhei
changed pull request title from
GPTQ-loading with π€ transformers lib
to Add GPTQ-loading with π€ transformers lib
Great work. Tested and works as expected.
RuterNorway
changed pull request status to
open
RuterNorway
changed pull request status to
merged