NeedleBench: Can LLMs Do Retrieval and Reasoning in 1 Million Context Window? Paper • 2407.11963 • Published Jul 16 • 43
Blending Is All You Need: Cheaper, Better Alternative to Trillion-Parameters LLM Paper • 2401.02994 • Published Jan 4 • 47
MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts Paper • 2401.04081 • Published Jan 8 • 70
LLM Augmented LLMs: Expanding Capabilities through Composition Paper • 2401.02412 • Published Jan 4 • 36
Understanding LLMs: A Comprehensive Overview from Training to Inference Paper • 2401.02038 • Published Jan 4 • 61
LLaMA Beyond English: An Empirical Study on Language Capability Transfer Paper • 2401.01055 • Published Jan 2 • 54
DocLLM: A layout-aware generative language model for multimodal document understanding Paper • 2401.00908 • Published Dec 31, 2023 • 181
Unicron: Economizing Self-Healing LLM Training at Scale Paper • 2401.00134 • Published Dec 30, 2023 • 9
GeoGalactica: A Scientific Large Language Model in Geoscience Paper • 2401.00434 • Published Dec 31, 2023 • 7
Boosting Large Language Model for Speech Synthesis: An Empirical Study Paper • 2401.00246 • Published Dec 30, 2023 • 10
Improving Text Embeddings with Large Language Models Paper • 2401.00368 • Published Dec 31, 2023 • 79
Astraios: Parameter-Efficient Instruction Tuning Code Large Language Models Paper • 2401.00788 • Published Jan 1 • 21
Beyond Chinchilla-Optimal: Accounting for Inference in Language Model Scaling Laws Paper • 2401.00448 • Published Dec 31, 2023 • 28
Principled Instructions Are All You Need for Questioning LLaMA-1/2, GPT-3.5/4 Paper • 2312.16171 • Published Dec 26, 2023 • 34
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling Paper • 2312.15166 • Published Dec 23, 2023 • 56
Time is Encoded in the Weights of Finetuned Language Models Paper • 2312.13401 • Published Dec 20, 2023 • 19