Remove deprecated Habana mixed precision from gaudi config

by olszd - opened Jul 21, 2023

base: refs/heads/main

←

from: refs/pr/4

Discussion Files changed

-24

olszd

Jul 21, 2023

Mixed precision was turned off in this model due to the usage of Deepspeed

Remove deprecated Habana mixed precision from gaudi config06d0a0c7

regisss

Habana AI org Jul 26, 2023

@olszd Could you add "use_torch_autocast": true in the Gaudi config please? Actually HMP was enabled since we have "use_habana_mixed_precision": true.

Besides, regression tests didn't pass because there is an issue with autocast when doing gradient checkpointing. So let's wait for this to be solved before merging this PR.

regisss

Habana AI org Jul 26, 2023

The issue with gradient checkpointing was solved, but default autocast is ~20% slower than HMP with custom ops. I tried autocast with custom ops, got similar speeds to HMP, but the loss is NaN. Let's wait for this PR to be merged before doing anything here.

Update gaudi_config.jsond2a32582

olszd

Sep 26, 2023

@regisss I've updated the config, can we retest now and merge if the tests pass?

regisss

Habana AI org Oct 20, 2023

I updated the Gaudi config with custom bf16/fp32 op lists that give better throughput and similar accuracy, closing this one: https://huggingface.co/Habana/gpt2/blob/main/gaudi_config.json

regisss changed pull request status to closed Oct 20, 2023

regisss

Habana AI org Nov 24, 2023

I'm getting better results keeping the same custom ops (the current ones with HMP) for autocast than default autocast, so I'm going to update that directly.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment