In Search of Needles in a 10M Haystack: Recurrent Memory Finds What LLMs Miss
Abstract
This paper addresses the challenge of processing long documents using generative transformer models. To evaluate different approaches, we introduce BABILong, a new benchmark designed to assess model capabilities in extracting and processing distributed facts within extensive texts. Our evaluation, which includes benchmarks for GPT-4 and RAG, reveals that common methods are effective only for sequences up to 10^4 elements. In contrast, fine-tuning GPT-2 with recurrent memory augmentations enables it to handle tasks involving up to 10^7 elements. This achievement marks a substantial leap, as it is by far the longest input processed by any open neural network model to date, demonstrating a significant improvement in the processing capabilities for long sequences.
Community
Great paper
Why does "QA3: Three Supporting Facts" have between 2 and 320 facts?
For such a large variance, the description is misleading, and it should be broken up into ranges to make it easier to see the distribution.
Or am I missing something?
Thank you for the feedback!
All QA* tasks are based on the bAbI dataset. Some rare samples of qa3 do indeed have a large total number of facts, but most of them have less than 100.
However for qa3 only 3 supporting facts are needed to answer the question, other ones act as distractors. Supporting facts in the task context are still like a needle in a haystack.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- InfLLM: Unveiling the Intrinsic Capacity of LLMs for Understanding Extremely Long Sequences with Training-Free Memory (2024)
- Transformers are Multi-State RNNs (2024)
- World Model on Million-Length Video And Language With RingAttention (2024)
- Recurrent Transformers with Dynamic Halt (2024)
- Understanding LLMs: A Comprehensive Overview from Training to Inference (2024)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
just curious are the authors planning on releasing the code this time? It's been 2 years since the first RMT paper and their's still no working implmentations in the community.
I am also interested in the code
How Recurrent Memory Revolutionizes Long Document Processing
Links ๐:
๐ Subscribe: https://www.youtube.com/@Arxflix
๐ Twitter: https://x.com/arxflix
๐ LMNT (Partner): https://lmnt.com/
Models citing this paper 0
No model linking this paper