Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models Paper • 2409.17146 • Published Sep 25 • 99
Wolf: Captioning Everything with a World Summarization Framework Paper • 2407.18908 • Published Jul 26 • 30
A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style Models on Dense Captions Paper • 2312.08578 • Published Dec 14, 2023 • 16
Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference Paper • 2310.04378 • Published Oct 6, 2023 • 19
PockEngine: Sparse and Efficient Fine-tuning in a Pocket Paper • 2310.17752 • Published Oct 26, 2023 • 12