DeepSpeed ZeRO-3 and full finetune
Hi! A question: did you have challenges with using DeepSpeed ZeRO-3 and full finetune? What was the reason for using DeepSpeed ZeRO-2 and QLoRa? I'm asking because we have an issue with LLMs and DeepSpeed ZeRO-3. The issue is that if you load on LLM with ZeRO-3, then save, and then load again, the model becomes broken. Did you experience something like that?
I always use qLoRA to save VRAM
I can't use deep speed zero3 - I always get error messages.
While it was not used for this model we have used ZeRO-3 Full and LoRA for other models successfully. Depending on the setup we have to run a manual weight gather step. Other than that it seems to work. When we use ZeRO-3 + LoRA we disable optimizer offload since the LoRA weights tend to be a small fraction. We have not tested QLoRA.