Could not run on Colab
#15
by
MosheBeeri
- opened
It just consumes all available memory
51 G Ram, and 16 G GPU RAM
Any idea besides using machine with more RAM?
Of course, even 13b model need v100_32g to run, the 40b model must need more!
Model weights alone are ~= 80GB, so fast inference would require at least 90-100GB.
You can try to see if you can get accelerate with cpu offloading to work: https://huggingface.co/docs/accelerate/package_reference/big_modeling
The community has also created a 4bit quantised version of the model: https://huggingface.co/TheBloke/falcon-40b-instruct-GPTQ, which should only require 20GB for the model weights.
Otherwise the best bet would be to work with the smaller models: https://huggingface.co/tiiuae/falcon-7b
FalconLLM
changed discussion status to
closed