Change the assert to warning in init

When enabling phi-3-small on non-cuda devices, flash_attn package is not available. The assert of flash_attn in __init__ will force the exit. The patch changes the assert into warning, so that we can use customized implementation of flash attention.

Files changed (1) hide show

modeling_phi3_small.py +2 -1

modeling_phi3_small.py CHANGED Viewed

@@ -215,7 +215,8 @@ class Phi3SmallSelfAttention(nn.Module):
                 f"Layer {layer_idx + 1} is using dense attention since it is divisible by "
                 f"{self.config.dense_attention_every_n_layers}"
             )
-            assert is_flash_attention_available, "Flash Attention is not available, but is needed for dense attention"
         else:
             # BlockSparse related Parameters
             self.blocksparse_params = BlockSparseParams.from_config(config)

                 f"Layer {layer_idx + 1} is using dense attention since it is divisible by "
                 f"{self.config.dense_attention_every_n_layers}"
             )
+            # use warnings to allow the modeling use different flash attention implementation later
+            logger.warning_once("Flash Attention is not available, but is needed for dense attention")
         else:
             # BlockSparse related Parameters
             self.blocksparse_params = BlockSparseParams.from_config(config)

Change the assert to warning in __init__

Change the assert to warning in init