[ERROR]: torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 22.88 GiB. GPU

#4
by Axinx - opened

Hello! When I deploy and run this model locally, I face a strange error. When I pass messages shorter than 16K.Everything is good! The model can return results except for a lit slowly! But When I pass a message longer than 16K to 30k. It shows me this error info. My machine has five GPU which is NVIDIA RTX A6000. They have total 50GB*5. So I am confused why this error happened. when I load model from local storage. I have already config param device_map="auto". So I want get some help about this! Thank you!

The model I used is deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct.

@Axinx - What are you using to run the model locally? llama.cpp, vllm or anything else?

The model card doc says 80GB*8 GPUs tu run the model

Sign up or log in to comment