DeepSpeed ZeRO-3 and full finetune

by Andriy - opened Mar 11

Mar 11

Hi! A question: did you have challenges with using DeepSpeed ZeRO-3 and full finetune? What was the reason for using DeepSpeed ZeRO-2 and QLoRa? I'm asking because we have an issue with LLMs and DeepSpeed ZeRO-3. The issue is that if you load on LLM with ZeRO-3, then save, and then load again, the model becomes broken. Did you experience something like that?

ehartford

Mar 11

I always use qLoRA to save VRAM
I can't use deep speed zero3 - I always get error messages.

siddartha-abacus

Abacus.AI, Inc. org Mar 11

While it was not used for this model we have used ZeRO-3 Full and LoRA for other models successfully. Depending on the setup we have to run a manual weight gather step. Other than that it seems to work. When we use ZeRO-3 + LoRA we disable optimizer offload since the LoRA weights tend to be a small fraction. We have not tested QLoRA.

siddartha-abacus changed discussion status to closed Mar 11

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment