Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think Paper • 2409.11355 • Published 3 days ago • 24
EzAudio: Enhancing Text-to-Audio Generation with Efficient Diffusion Transformer Paper • 2409.10819 • Published 3 days ago • 11
A Diffusion Approach to Radiance Field Relighting using Multi-Illumination Synthesis Paper • 2409.08947 • Published 7 days ago • 11
DrawingSpinUp: 3D Animation from Single Character Drawings Paper • 2409.08615 • Published 7 days ago • 10
InstantDrag: Improving Interactivity in Drag-based Image Editing Paper • 2409.08857 • Published 7 days ago • 24
FlashSplat: 2D to 3D Gaussian Splatting Segmentation Solved Optimally Paper • 2409.08270 • Published 8 days ago • 8
DreamHOI: Subject-Driven Generation of 3D Human-Object Interactions with Diffusion Priors Paper • 2409.08278 • Published 8 days ago • 10
Generative Inbetweening: Adapting Image-to-Video Models for Keyframe Interpolation Paper • 2408.15239 • Published 24 days ago • 27
VideoLLaMB: Long-context Video Understanding with Recurrent Memory Bridges Paper • 2409.01071 • Published 18 days ago • 26
DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos Paper • 2409.02095 • Published 17 days ago • 32
Sapiens Collection Foundation models for human tasks. Code: https://github.com/facebookresearch/sapiens • 72 items • Updated 1 day ago • 22
MagicMan: Generative Novel View Synthesis of Humans with 3D-Aware Diffusion and Iterative Refinement Paper • 2408.14211 • Published 25 days ago • 8
Power Scheduler: A Batch Size and Token Number Agnostic Learning Rate Scheduler Paper • 2408.13359 • Published 28 days ago • 21
LayerPano3D: Layered 3D Panorama for Hyper-Immersive Scene Generation Paper • 2408.13252 • Published 28 days ago • 23
Real-Time Video Generation with Pyramid Attention Broadcast Paper • 2408.12588 • Published 29 days ago • 13
Enhance Your Images Collection Some trending Gradio apps on Spaces that you can use to enhance/upscale your images for free. This collection will be kept uptodate with new releases. • 7 items • Updated 29 days ago • 17
Gradio Spaces for Background Removal Collection Enhance your images by removing the background. Will ensure these Spaces are up and maintained for the community. • 5 items • Updated about 1 month ago • 23
LongVILA: Scaling Long-Context Visual Language Models for Long Videos Paper • 2408.10188 • Published Aug 19 • 51
TraDiffusion: Trajectory-Based Training-Free Image Generation Paper • 2408.09739 • Published Aug 19 • 7
MeshFormer: High-Quality Mesh Generation with 3D-Guided Reconstruction Model Paper • 2408.10198 • Published Aug 19 • 32
Photorealistic Object Insertion with Diffusion-Guided Inverse Rendering Paper • 2408.09702 • Published Aug 19 • 9
xGen-MM (BLIP-3): A Family of Open Large Multimodal Models Paper • 2408.08872 • Published Aug 16 • 96
JPEG-LM: LLMs as Image Generators with Canonical Codec Representations Paper • 2408.08459 • Published Aug 15 • 44
LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs Paper • 2408.07055 • Published Aug 13 • 65
ControlNeXt: Powerful and Efficient Control for Image and Video Generation Paper • 2408.06070 • Published Aug 12 • 52
CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer Paper • 2408.06072 • Published Aug 12 • 35
Transformer Explainer: Interactive Learning of Text-Generative Models Paper • 2408.04619 • Published Aug 8 • 152
AuraFlow Collection AuraFlow v0.x series, to date the largest (6.8B) and highest fidelity (0.7+ on GenEval) open sourced text to image model. • 3 items • Updated 14 days ago • 4
MindSearch: Mimicking Human Minds Elicits Deep AI Searcher Paper • 2407.20183 • Published Jul 29 • 37
Mixture of Nested Experts: Adaptive Processing of Visual Tokens Paper • 2407.19985 • Published Jul 29 • 33
A Simulation Benchmark for Autonomous Racing with Large-Scale Human Data Paper • 2407.16680 • Published Jul 23 • 11
HoloDreamer: Holistic 3D Panoramic World Generation from Text Descriptions Paper • 2407.15187 • Published Jul 21 • 10
SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models Paper • 2407.15841 • Published Jul 22 • 38
SparseCraft: Few-Shot Neural Reconstruction through Stereopsis Guided Geometric Linearization Paper • 2407.14257 • Published Jul 19 • 5
Scaling Retrieval-Based Language Models with a Trillion-Token Datastore Paper • 2407.12854 • Published Jul 9 • 29
Still-Moving: Customized Video Generation without Customized Video Data Paper • 2407.08674 • Published Jul 11 • 11
LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models Paper • 2407.07895 • Published Jul 10 • 40