RuterNorway/Llama-2-13b-chat-norwegian-GPTQ · Add GPTQ-loading with 🤗 transformers lib

Sep 26, 2023

•

edited Sep 26, 2023

This pull request converts the model files to make it runnable with the 🤗 transformerslibrary directly, by converting the files to the same format as TheBloke models.
This includes:

Adding the quantization config to config.json
Adding metadata to model.safetensors: {"format": "pt", "quantized_by": "RuterNorway"}

Then, given that the librariestransformers, optimum, auto-gptq
you should be able to load it in like this:

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("RuterNorway/Llama-2-13b-chat-norwegian-GPTQ")
model = AutoModelForCausalLM.from_pretrained("RuterNorway/Llama-2-13b-chat-norwegian-GPTQ")

marksverdhei changed pull request title from GPTQ-loading with 🤗 transformers lib to Add GPTQ-loading with 🤗 transformers lib Sep 26, 2023

Add metadata and rename model to model.safetensors7baec41f

Add quantization config to config.json07fad640

add quantize configfd7dfbf8

RuterNorway

Owner Sep 27, 2023

Great work. Tested and works as expected.

RuterNorway changed pull request status to open Sep 27, 2023

RuterNorway changed pull request status to merged Sep 27, 2023