Running this model without flash-attn
#17
by
lisa-tse
- opened
Hi, is it possible to run this model without flash-attn? It seems to be a requirement in your code at the moment (e.g. modeling_qwen.py).
Hi, I managed to run it without installing flash-attn
on SageMaker. Looking at the code, it checks if it's installed and if it is not, it skips the import
from transformers.utils import (
add_start_docstrings,
add_start_docstrings_to_model_forward,
is_flash_attn_2_available,
is_flash_attn_greater_or_equal_2_10,
logging,
replace_return_docstrings,
)
if is_flash_attn_2_available():
from flash_attn import flash_attn_func, flash_attn_varlen_func
from flash_attn.bert_padding import index_first_axis, pad_input, unpad_input # noqa
_flash_supports_window_size = "window_size" in list(inspect.signature(flash_attn_func).parameters)
Bumping this request: This model creates a a bad user experience - https://github.com/michaelfeil/infinity/issues/308 -> flash-attn is checked and raises.
I'd suggest to PR this remote code to the actual transformers
lib for a better experience. The embedding model does not employ window_attn, hence there should be little need for FA2.