This is great stuff and necessary.

#1
by 1TBGPU4EVR - opened

We need lighter models that can perform faster and function as modular parts in real-time workflows. I can't wait to try this with Flash Attention.

Sign up or log in to comment