How to Load into colab

#1
by ghpkishore - opened

I cannot seem to locally load the model in colab using the git function. It shows that setup.py is missing. Also, when I try to use the normal method of "from transformer import", I am not able to load it cause the RAM gets over. I am using Google Colab pro account. Is there a way for me to resolve this?

Hi,

Looking at the docs, the weights are in float16 format, meaning that 16 bits or 2 bytes are used to store each parameter.

That means that, for a 20 billion parameter model, you need 20 billion parameters * 2 bytes / parameter = 40 billion bytes, also known as 40 GB. That's the amount of RAM required to load the model.

EleutherAI org

That’s not quite correct. GPT-NeoX-20B was trained using mixed precision (fp32/fp16). These weights are in fp32, which is why the docs mention using .half() before loading the model onto GPU.

I’m not sure what GPUs you are able to get via Colab but inference with this model typically requires more then 40 GB of VRAM.

Oh my apologies. I read from the docs "GPT-NeoX-20B was trained with fp16", I guess this can be corrected.

Also, I think it may be beneficial to add the RAM requirements to the docs as well, similar to the "tips" section of GPT-J.

Do you think it would be beneficial to have a separate branch on this repo with float16 weights?

cc'ing @sgugger regarding whether or not this model can be loaded into Google Colab using Accelerate's big model inference feature.

Not on Colab free no, they don't provide enough disk space to even download the weights.

@stellaathena I'm surprised to learn the model was trained in fp16 (not bfloat16?) as we get crappy generations in FP16 but decent ones in bfloat16 in our tests.

Edit: Looks like it was only a bug in the Transformers implementation. https://github.com/huggingface/transformers/pull/17811 should fix the float16 generations.

Thanks for letting me know and fixing the issue @sgugger . I will upgrade to Colab Pro and see if can run their.

stellaathena changed discussion status to closed

I have tried running the model in Colab Pro but failed, as it only has 39GB~40GB gpu ram

github codespaces

Sign up or log in to comment