Aya 23: Open Weight Releases to Further Multilingual Progress Paper • 2405.15032 • Published May 23 • 26
CAT3D: Create Anything in 3D with Multi-View Diffusion Models Paper • 2405.10314 • Published May 16 • 43
MS MARCO Web Search: a Large-scale Information-rich Web Dataset with Millions of Real Click Labels Paper • 2405.07526 • Published May 13 • 17
LogoMotion: Visually Grounded Code Generation for Content-Aware Animation Paper • 2405.07065 • Published May 11 • 16
SemantiCodec: An Ultra Low Bitrate Semantic Audio Codec for General Sound Paper • 2405.00233 • Published Apr 30 • 13
BlenderAlchemy: Editing 3D Graphics with Vision-Language Models Paper • 2404.17672 • Published Apr 26 • 18
ConsistentID: Portrait Generation with Multimodal Fine-Grained Identity Preserving Paper • 2404.16771 • Published Apr 25 • 16
How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites Paper • 2404.16821 • Published Apr 25 • 53
PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning Paper • 2404.16994 • Published Apr 25 • 35
HaLo-NeRF: Learning Geometry-Guided Semantics for Exploring Unconstrained Photo Collections Paper • 2404.16845 • Published Feb 14 • 6
Interactive3D: Create What You Want by Interactive 3D Generation Paper • 2404.16510 • Published Apr 25 • 18
Align Your Steps: Optimizing Sampling Schedules in Diffusion Models Paper • 2404.14507 • Published Apr 22 • 21
SEED-X: Multimodal Models with Unified Multi-granularity Comprehension and Generation Paper • 2404.14396 • Published Apr 22 • 18
Hyper-SD: Trajectory Segmented Consistency Model for Efficient Image Synthesis Paper • 2404.13686 • Published Apr 21 • 27
How Good Are Low-bit Quantized LLaMA3 Models? An Empirical Study Paper • 2404.14047 • Published Apr 22 • 44
TextSquare: Scaling up Text-Centric Visual Instruction Tuning Paper • 2404.12803 • Published Apr 19 • 29