Running on RTX 3090

#17

by alanbuxton - opened Aug 5

Aug 5

I read that Nemo runs on a RTX 4090. I have a 3090 which I understand has the same size RAM. But when I try the sample code I get an out of memory error.

What do I need to do to try out this model on a 3090?

pandora-s

Mistral AI_ org Aug 5

Hi Alan, the code in the readme is to run at 16 bit precision, it would need around 28gb of VRAM (and yours has 24gb), however! This model was designed to be able to run lossless at 8 bit precision, meaning you can do inference at fp8 precision and would fit in 16gb of VRAM without any issue! I also invite you to look into quantization.

Also note that this repo is for the base model, meaning for raw text completion, for instructions and to chat with the model I recommend the Instruct version.

alanbuxton changed discussion status to closed Aug 5

elbiot

Aug 25

@pandora-s Is it possible to run this model in fp8 without doing quantization? I tried in vLLM to say the dtype or the kv cache type were fp8 but nothing worked

alanbuxton

Aug 25

For anyone else reading this, here is the code I used to run on my local GPU:

from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig

model_id = "mistralai/Mistral-Nemo-Base-2407"
tokenizer = AutoTokenizer.from_pretrained(model_id)

model = AutoModelForCausalLM.from_pretrained(model_id,
    quantization_config=BitsAndBytesConfig(load_in_8bit=True),
)
inputs = tokenizer("Hello my name is", return_tensors="pt",return_token_type_ids=False).to('cuda')

outputs = model.generate(**inputs, max_new_tokens=20)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Took about 13GB of GPU RAM

elbiot

Aug 25

I'm pretty sure that's 8 bit integer quantization which is not the quantization the model was trained for. " 8-bit quantization multiplies outliers in fp16 with non-outliers in int8, converts the non-outlier values back to fp16" from https://github.com/huggingface/transformers/blob/main/docs/source/en/quantization/bitsandbytes.md

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment