Unpacking SDXL Turbo: Interpreting Text-to-Image Models with Sparse Autoencoders Paper • 2410.22366 • Published 15 days ago • 73
SmolLM2 Collection State-of-the-art compact LLMs for on-device applications: 1.7B, 360M, 135M • 8 items • Updated 9 days ago • 163
timm tiny test models Collection A collection of very small (~300-500k parameter) models at 160x160 resolution, for testing purposes. Trained on ImageNet-1k. • 13 items • Updated Oct 2 • 3
Scalable Ranked Preference Optimization for Text-to-Image Generation Paper • 2410.18013 • Published 20 days ago • 14
Scaling Diffusion Language Models via Adaptation from Autoregressive Models Paper • 2410.17891 • Published 21 days ago • 15
MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models Paper • 2410.17637 • Published 21 days ago • 34
WorldSimBench: Towards Video Generation Models as World Simulators Paper • 2410.18072 • Published 20 days ago • 16
PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction Paper • 2410.17247 • Published 21 days ago • 43
Kolmogorov-Arnold Convolutions: Design Principles and Empirical Studies Paper • 2407.01092 • Published Jul 1 • 1
VILA-U-7B Collection VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation • 2 items • Updated 22 days ago • 5
DC-AE-Diffusion Collection Efficient Diffusion Models with Deep Compression Autoencoder • 7 items • Updated 1 day ago • 6
From Generalist to Specialist: Adapting Vision Language Models via Task-Specific Visual Instruction Tuning Paper • 2410.06456 • Published Oct 9 • 35