This is great stuff and necessary.
#1
by
1TBGPU4EVR
- opened
We need lighter models that can perform faster and function as modular parts in real-time workflows. I can't wait to try this with Flash Attention.
We need lighter models that can perform faster and function as modular parts in real-time workflows. I can't wait to try this with Flash Attention.