Structured Packing in LLM Training Improves Long Context Utilization Paper • 2312.17296 • Published Dec 28, 2023 • 2
Scaling Relationship on Learning Mathematical Reasoning with Large Language Models Paper • 2308.01825 • Published Aug 3, 2023 • 21
Focused Transformer: Contrastive Training for Context Scaling Paper • 2307.03170 • Published Jul 6, 2023 • 11