Run inference on 2 GPUs

#112
by bweinstein123 - opened

Hi,

I have 2 RTX600 GPUs but I can't figure out how to run in the following way, on both gpus.

from transformers import AutoModelForCausalLM, AutoTokenizer
 
model_id = "mistralai/Mistral-7B-Instruct-v0.2"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)
model.half().cuda()

inputs = tokenizer(text, return_tensors="pt")
inputs_gpu = {key: value.to("cuda") for key, value in inputs.items()}

outputs = model.generate(**inputs_gpu, max_new_tokens=500)

Hi @bweinstein123
Please see my comment here: https://huggingface.co/mistralai/Mixtral-8x7B-v0.1/discussions/36#65b8d5cf23d948d884d19645 to understand how to run multi-GPU inference

Sign up or log in to comment