Training LLMs over Neurally Compressed Text
Paper
•
2404.03626
•
Published
•
21
Emergence with scale is unlikely Given the recent findings of [55], we anticipate that continuing to scale models beyond 2 billion parameters is unlik