Cached Transformers: Improving Transformers with Differentiable Memory Cache Paper • 2312.12742 • Published Dec 20, 2023 • 12
Paloma: A Benchmark for Evaluating Language Model Fit Paper • 2312.10523 • Published Dec 16, 2023 • 12
The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale Paper • 2406.17557 • Published Jun 25 • 86
LongIns: A Challenging Long-context Instruction-based Exam for LLMs Paper • 2406.17588 • Published Jun 25 • 20