jacobfulano
commited on
Commit
•
8a9076d
1
Parent(s):
ed2a544
Update detail about Triton Flash Attention with ALiBi implementation
Browse files
README.md
CHANGED
@@ -64,6 +64,12 @@ This simply presets the non-learned linear bias matrix in every attention block
|
|
64 |
|
65 |
**To fine-tune this model for classification**, follow the [Single-task fine-tuning section of the mosaicml/examples/benchmarks/bert repo](https://github.com/mosaicml/examples/tree/main/examples/benchmarks/bert#fine-tuning).
|
66 |
|
|
|
|
|
|
|
|
|
|
|
|
|
67 |
### Remote Code
|
68 |
|
69 |
This model requires that `trust_remote_code=True` be passed to the `from_pretrained` method. This is because we train using [FlashAttention (Dao et al. 2022)](https://arxiv.org/pdf/2205.14135.pdf), which is not part of the `transformers` library and depends on [Triton](https://github.com/openai/triton) and some custom PyTorch code. Since this involves executing arbitrary code, you should consider passing a git `revision` argument that specifies the exact commit of the code, for example:
|
|
|
64 |
|
65 |
**To fine-tune this model for classification**, follow the [Single-task fine-tuning section of the mosaicml/examples/benchmarks/bert repo](https://github.com/mosaicml/examples/tree/main/examples/benchmarks/bert#fine-tuning).
|
66 |
|
67 |
+
### [Update 1/2/2024] Triton Flash Attention with ALiBi
|
68 |
+
|
69 |
+
Note that by default, triton Flash Attention is **not** enabled or required. In order to enable our custom implementation of triton Flash Attention with ALiBi from March 2023,
|
70 |
+
set `attention_probs_dropout_prob: 0.0`. We are currently working on supporting Flash Attention 2 (see [PR here](https://github.com/mosaicml/examples/pull/440)) and replacing the custom triton implementation.
|
71 |
+
|
72 |
+
|
73 |
### Remote Code
|
74 |
|
75 |
This model requires that `trust_remote_code=True` be passed to the `from_pretrained` method. This is because we train using [FlashAttention (Dao et al. 2022)](https://arxiv.org/pdf/2205.14135.pdf), which is not part of the `transformers` library and depends on [Triton](https://github.com/openai/triton) and some custom PyTorch code. Since this involves executing arbitrary code, you should consider passing a git `revision` argument that specifies the exact commit of the code, for example:
|