VideoBooth: Diffusion-based Video Generation with Image Prompts Paper • 2312.00777 • Published Dec 1, 2023 • 21
MotionCtrl: A Unified and Flexible Motion Controller for Video Generation Paper • 2312.03641 • Published Dec 6, 2023 • 20
GenTron: Delving Deep into Diffusion Transformers for Image and Video Generation Paper • 2312.04557 • Published Dec 7, 2023 • 12
DreamVideo: Composing Your Dream Videos with Customized Subject and Motion Paper • 2312.04433 • Published Dec 7, 2023 • 9
AnimateLCM: Accelerating the Animation of Personalized Diffusion Models and Adapters with Decoupled Consistency Learning Paper • 2402.00769 • Published Feb 1 • 20
Motion-I2V: Consistent and Controllable Image-to-Video Generation with Explicit Motion Modeling Paper • 2401.15977 • Published Jan 29 • 36
Object-Driven One-Shot Fine-tuning of Text-to-Image Diffusion with Prototypical Embedding Paper • 2401.15708 • Published Jan 28 • 10
Diffuse to Choose: Enriching Image Conditioned Inpainting in Latent Diffusion Models for Virtual Try-All Paper • 2401.13795 • Published Jan 24 • 65
Deconstructing Denoising Diffusion Models for Self-Supervised Learning Paper • 2401.14404 • Published Jan 25 • 16
BootPIG: Bootstrapping Zero-shot Personalized Image Generation Capabilities in Pretrained Diffusion Models Paper • 2401.13974 • Published Jan 25 • 12
Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild Paper • 2401.13627 • Published Jan 24 • 72
Lumiere: A Space-Time Diffusion Model for Video Generation Paper • 2401.12945 • Published Jan 23 • 86
Spotting LLMs With Binoculars: Zero-Shot Detection of Machine-Generated Text Paper • 2401.12070 • Published Jan 22 • 42
StreamVoice: Streamable Context-Aware Language Modeling for Real-time Zero-Shot Voice Conversion Paper • 2401.11053 • Published Jan 19 • 9
Scalable High-Resolution Pixel-Space Image Synthesis with Hourglass Diffusion Transformers Paper • 2401.11605 • Published Jan 21 • 21
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data Paper • 2401.10891 • Published Jan 19 • 58
Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads Paper • 2401.10774 • Published Jan 19 • 53
WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens Paper • 2401.09985 • Published Jan 18 • 14
VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models Paper • 2401.09047 • Published Jan 17 • 13
InstantID: Zero-shot Identity-Preserving Generation in Seconds Paper • 2401.07519 • Published Jan 15 • 51
Design2Code: How Far Are We From Automating Front-End Engineering? Paper • 2403.03163 • Published Mar 5 • 93
FlashFace: Human Image Personalization with High-fidelity Identity Preservation Paper • 2403.17008 • Published Mar 25 • 19
InstantFamily: Masked Attention for Zero-shot Multi-ID Image Generation Paper • 2404.19427 • Published Apr 30 • 71
ConsistentID: Portrait Generation with Multimodal Fine-Grained Identity Preserving Paper • 2404.16771 • Published Apr 25 • 16
PuLID: Pure and Lightning ID Customization via Contrastive Alignment Paper • 2404.16022 • Published Apr 24 • 19
Hyper-SD: Trajectory Segmented Consistency Model for Efficient Image Synthesis Paper • 2404.13686 • Published Apr 21 • 27
Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing Paper • 2404.12253 • Published Apr 18 • 53
ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback Paper • 2404.07987 • Published Apr 11 • 47
RULER: What's the Real Context Size of Your Long-Context Language Models? Paper • 2404.06654 • Published Apr 9 • 33
ByteEdit: Boost, Comply and Accelerate Generative Image Editing Paper • 2404.04860 • Published Apr 7 • 24
SwapAnything: Enabling Arbitrary Object Swapping in Personalized Visual Editing Paper • 2404.05717 • Published Apr 8 • 24
MagicTime: Time-lapse Video Generation Models as Metamorphic Simulators Paper • 2404.05014 • Published Apr 7 • 53
Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences Paper • 2404.03715 • Published Apr 4 • 60
Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction Paper • 2404.02905 • Published Apr 3 • 64
StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation Paper • 2405.01434 • Published May 2 • 52
MLCM: Multistep Consistency Distillation of Latent Diffusion Model Paper • 2406.05768 • Published Jun 9 • 8
RealTalk: Real-time and Realistic Audio-driven Face Generation with 3D Facial Prior-guided Identity Alignment Network Paper • 2406.18284 • Published Jun 26 • 19
GenCA: A Text-conditioned Generative Model for Realistic and Drivable Codec Avatars Paper • 2408.13674 • Published Aug 24 • 17