stereoplegic
's Collections
Trellis Networks for Sequence Modeling
Paper
•
1810.06682
•
Published
•
1
ProSG: Using Prompt Synthetic Gradients to Alleviate Prompt Forgetting
of RNN-like Language Models
Paper
•
2311.01981
•
Published
•
1
Gated recurrent neural networks discover attention
Paper
•
2309.01775
•
Published
•
7
Inverse Approximation Theory for Nonlinear Recurrent Neural Networks
Paper
•
2305.19190
•
Published
•
1
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Paper
•
2312.00752
•
Published
•
138
On the Universality of Linear Recurrences Followed by Nonlinear
Projections
Paper
•
2307.11888
•
Published
•
1
Laughing Hyena Distillery: Extracting Compact Recurrences From
Convolutions
Paper
•
2310.18780
•
Published
•
3
Cached Transformers: Improving Transformers with Differentiable Memory
Cache
Paper
•
2312.12742
•
Published
•
12
RNNs of RNNs: Recursive Construction of Stable Assemblies of Recurrent
Neural Networks
Paper
•
2106.08928
•
Published
•
1
StableSSM: Alleviating the Curse of Memory in State-space Models through
Stable Reparameterization
Paper
•
2311.14495
•
Published
•
1
Hierarchically Gated Recurrent Neural Network for Sequence Modeling
Paper
•
2311.04823
•
Published
•
2
Enhancing Transformer RNNs with Multiple Temporal Perspectives
Paper
•
2402.02625
•
Published
Griffin: Mixing Gated Linear Recurrences with Local Attention for
Efficient Language Models
Paper
•
2402.19427
•
Published
•
52
Improving Token-Based World Models with Parallel Observation Prediction
Paper
•
2402.05643
•
Published
•
1
Diffusion-RWKV: Scaling RWKV-Like Architectures for Diffusion Models
Paper
•
2404.04478
•
Published
•
12
HGRN2: Gated Linear RNNs with State Expansion
Paper
•
2404.07904
•
Published
•
17
GoldFinch: High Performance RWKV/Transformer Hybrid with Linear Pre-Fill
and Extreme KV-Cache Compression
Paper
•
2407.12077
•
Published
•
54