Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research Paper • 2402.00159 • Published Jan 31 • 59
CodeFusion: A Pre-trained Diffusion Model for Code Generation Paper • 2310.17680 • Published Oct 26, 2023 • 69
UniversalNER: Targeted Distillation from Large Language Models for Open Named Entity Recognition Paper • 2308.03279 • Published Aug 7, 2023 • 21