Habana

Remove deprecated Habana mixed precision from gaudi config

#4
by olszd - opened

Mixed precision was turned off in this model due to the usage of Deepspeed

Habana AI org

@olszd Could you add "use_torch_autocast": true in the Gaudi config please? Actually HMP was enabled since we have "use_habana_mixed_precision": true.

Besides, regression tests didn't pass because there is an issue with autocast when doing gradient checkpointing. So let's wait for this to be solved before merging this PR.

Habana AI org

The issue with gradient checkpointing was solved, but default autocast is ~20% slower than HMP with custom ops. I tried autocast with custom ops, got similar speeds to HMP, but the loss is NaN. Let's wait for this PR to be merged before doing anything here.

@regisss I've updated the config, can we retest now and merge if the tests pass?

Habana AI org

I updated the Gaudi config with custom bf16/fp32 op lists that give better throughput and similar accuracy, closing this one: https://huggingface.co/Habana/gpt2/blob/main/gaudi_config.json

regisss changed pull request status to closed
Habana AI org

I'm getting better results keeping the same custom ops (the current ones with HMP) for autocast than default autocast, so I'm going to update that directly.

Sign up or log in to comment