Enhancement Request: Model Sharding for DeepSeek-Coder-6.7b-Instruct

#4
by Firejowl - opened

Dear DeepSeek AI Team,

Greetings! I am reaching out to discuss the DeepSeek-Coder-6.7b-Instruct model and how its accessibility could be further improved. As someone eager to utilize this model, I've encountered constraints due to the limitations of lower-end hardware. Therefore, I propose the consideration of model sharding as a potential enhancement.

The introduction of a sharded version would be a significant step towards inclusivity, allowing those with less powerful machines to still take advantage of the model's capabilities. This not only benefits individual hobbyists and researchers with resource constraints but also enhances the utility of the model on cloud-based platforms where optimized resource usage is essential, such as Google Colab and Kaggle.

Understanding that model sharding entails technical complexities, I am hopeful that its implementation could widen the user base and foster a more diverse range of applications and innovations.

I am keen to hear your perspective on this suggestion and any other possible options to assist users like myself in overcoming hardware limitations.

Thank you for your pioneering work and for considering this request.

DeepSeek org

I don't really get it. Do you want to finetune this model or just run inference with it? If you want to finetune it on low-end hardware, I'd recommend QLoRA algorithm; if you want to run inference only, I'd recommend running a quantized version of the model (e.g.: the one from TheBloke).
Model Sharding is not for your use case I guess.

Chester111 changed discussion status to closed

If you shard the model you can run it through transformers on either cloud platform. This removes inference rate limits and allows people who don't have less financial capabilities to still access modern technology.

Here is an example:
https://youtu.be/c_S_KGRUzoY

Firejowl changed discussion status to open

Sign up or log in to comment