Controllable Text Generation for Large Language Models: A Survey Paper • 2408.12599 • Published Aug 22 • 62
xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations Paper • 2408.12590 • Published Aug 22 • 33
Real-Time Video Generation with Pyramid Attention Broadcast Paper • 2408.12588 • Published Aug 22 • 14
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model Paper • 2408.11039 • Published Aug 20 • 56
MegaFusion: Extend Diffusion Models towards Higher-resolution Image Generation without Further Tuning Paper • 2408.11001 • Published Aug 20 • 11
SwiftBrush v2: Make Your One-step Diffusion Model Better Than Its Teacher Paper • 2408.14176 • Published Aug 26 • 59
Distribution Backtracking Builds A Faster Convergence Trajectory for One-step Diffusion Distillation Paper • 2408.15991 • Published Aug 28 • 15
ReconX: Reconstruct Any Scene from Sparse Views with Video Diffusion Model Paper • 2408.16767 • Published Aug 29 • 29
Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution Paper • 2310.16834 • Published Oct 25, 2023 • 4
VisionTS: Visual Masked Autoencoders Are Free-Lunch Zero-Shot Time Series Forecasters Paper • 2408.17253 • Published Aug 30 • 35
DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos Paper • 2409.02095 • Published Sep 3 • 35
Loopy: Taming Audio-Driven Portrait Avatar with Long-Term Motion Dependency Paper • 2409.02634 • Published Sep 4 • 89
FastVoiceGrad: One-step Diffusion-Based Voice Conversion with Adversarial Conditional Diffusion Distillation Paper • 2409.02245 • Published Sep 3 • 9
Guide-and-Rescale: Self-Guidance Mechanism for Effective Tuning-Free Real Image Editing Paper • 2409.01322 • Published Sep 2 • 94
Geometry Image Diffusion: Fast and Data-Efficient Text-to-3D with Image-Based Surface Representation Paper • 2409.03718 • Published Sep 5 • 25
Qihoo-T2X: An Efficiency-Focused Diffusion Transformer via Proxy Tokens for Text-to-Any-Task Paper • 2409.04005 • Published Sep 6 • 16
Draw an Audio: Leveraging Multi-Instruction for Video-to-Audio Synthesis Paper • 2409.06135 • Published Sep 10 • 14
Hi3D: Pursuing High-Resolution Image-to-3D Generation with Video Diffusion Models Paper • 2409.07452 • Published Sep 11 • 18
Instant Facial Gaussians Translator for Relightable and Interactable Facial Rendering Paper • 2409.07441 • Published Sep 11 • 10
IFAdapter: Instance Feature Control for Grounded Text-to-Image Generation Paper • 2409.08240 • Published Sep 12 • 17
DreamHOI: Subject-Driven Generation of 3D Human-Object Interactions with Diffusion Priors Paper • 2409.08278 • Published Sep 12 • 10
FlashSplat: 2D to 3D Gaussian Splatting Segmentation Solved Optimally Paper • 2409.08270 • Published Sep 12 • 9
Robust Dual Gaussian Splatting for Immersive Human-centric Volumetric Videos Paper • 2409.08353 • Published Sep 12 • 10
InstantDrag: Improving Interactivity in Drag-based Image Editing Paper • 2409.08857 • Published Sep 13 • 30
A Diffusion Approach to Radiance Field Relighting using Multi-Illumination Synthesis Paper • 2409.08947 • Published Sep 13 • 11
DrawingSpinUp: 3D Animation from Single Character Drawings Paper • 2409.08615 • Published Sep 13 • 14
Seed-Music: A Unified Framework for High Quality and Controlled Music Generation Paper • 2409.09214 • Published Sep 13 • 46
Phidias: A Generative Model for Creating 3D Content from Text, Image, and 3D Conditions with Reference-Augmented Diffusion Paper • 2409.11406 • Published Sep 17 • 25
Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think Paper • 2409.11355 • Published Sep 17 • 28
OSV: One Step is Enough for High-Quality Image to Video Generation Paper • 2409.11367 • Published Sep 17 • 13
EzAudio: Enhancing Text-to-Audio Generation with Efficient Diffusion Transformer Paper • 2409.10819 • Published Sep 17 • 17
SplatFields: Neural Gaussian Splats for Sparse 3D and 4D Reconstruction Paper • 2409.11211 • Published Sep 17 • 8
Single-Layer Learnable Activation for Implicit Neural Representation (SL^{2}A-INR) Paper • 2409.10836 • Published Sep 17 • 4
Implicit Neural Representations with Fourier Kolmogorov-Arnold Networks Paper • 2409.09323 • Published Sep 14 • 5
Towards Diverse and Efficient Audio Captioning via Diffusion Models Paper • 2409.09401 • Published Sep 14 • 6
LVCD: Reference-based Lineart Video Colorization with Diffusion Models Paper • 2409.12960 • Published Sep 19 • 22
3DTopia-XL: Scaling High-quality 3D Asset Generation via Primitive Diffusion Paper • 2409.12957 • Published Sep 19 • 18
3DGS-LM: Faster Gaussian-Splatting Optimization with Levenberg-Marquardt Paper • 2409.12892 • Published Sep 19 • 5
Denoising Reuse: Exploiting Inter-frame Motion Consistency for Efficient Video Latent Generation Paper • 2409.12532 • Published Sep 19 • 5
FlexiTex: Enhancing Texture Generation with Visual Guidance Paper • 2409.12431 • Published Sep 19 • 11
MIMO: Controllable Character Video Synthesis with Spatial Decomposed Modeling Paper • 2409.16160 • Published Sep 24 • 32
PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions Paper • 2409.15278 • Published Sep 23 • 22
MaterialFusion: Enhancing Inverse Rendering with Material Diffusion Priors Paper • 2409.15273 • Published Sep 23 • 10
MaskedMimic: Unified Physics-Based Character Control Through Masked Motion Inpainting Paper • 2409.14393 • Published Sep 22 • 7
SpaceBlender: Creating Context-Rich Collaborative Spaces Through Generative 3D Scene Blending Paper • 2409.13926 • Published Sep 20 • 5
Portrait Video Editing Empowered by Multimodal Generative Priors Paper • 2409.13591 • Published Sep 20 • 15
Colorful Diffuse Intrinsic Image Decomposition in the Wild Paper • 2409.13690 • Published Sep 20 • 12
V^3: Viewing Volumetric Videos on Mobiles via Streamable 2D Dynamic Gaussians Paper • 2409.13648 • Published Sep 20 • 9
Reflecting Reality: Enabling Diffusion Models to Produce Faithful Mirror Reflections Paper • 2409.14677 • Published Sep 23 • 14
Lotus: Diffusion-based Visual Foundation Model for High-quality Dense Prediction Paper • 2409.18124 • Published Sep 26 • 31
Disco4D: Disentangled 4D Human Generation and Animation from a Single Image Paper • 2409.17280 • Published Sep 25 • 9
DreamWaltz-G: Expressive 3D Gaussian Avatars from Skeleton-Guided 2D Diffusion Paper • 2409.17145 • Published Sep 25 • 13
Degradation-Guided One-Step Image Super-Resolution with Diffusion Priors Paper • 2409.17058 • Published Sep 25 • 11
PhysGen: Rigid-Body Physics-Grounded Image-to-Video Generation Paper • 2409.18964 • Published Sep 27 • 25
Posterior-Mean Rectified Flow: Towards Minimum MSE Photo-Realistic Image Restoration Paper • 2410.00418 • Published Oct 1 • 9
SyntheOcc: Synthesize Geometric-Controlled Street View Images through 3D Semantic MPIs Paper • 2410.00337 • Published Oct 1 • 10
DressRecon: Freeform 4D Human Reconstruction from Monocular Video Paper • 2409.20563 • Published Sep 30 • 7
Flex3D: Feed-Forward 3D Generation With Flexible Reconstruction Model And Input View Curation Paper • 2410.00890 • Published Oct 1 • 17
Addition is All You Need for Energy-efficient Language Models Paper • 2410.00907 • Published Oct 1 • 143
ComfyGen: Prompt-Adaptive Workflows for Text-to-Image Generation Paper • 2410.01731 • Published Oct 2 • 15
3DGS-DET: Empower 3D Gaussian Splatting with Boundary Guidance and Box-Focused Sampling for 3D Object Detection Paper • 2410.01647 • Published Oct 2 • 28
HarmoniCa: Harmonizing Training and Inference for Better Feature Cache in Diffusion Transformer Acceleration Paper • 2410.01723 • Published Oct 2 • 4
Eliminating Oversaturation and Artifacts of High Guidance Scales in Diffusion Models Paper • 2410.02416 • Published Oct 3 • 25
PHI-S: Distribution Balancing for Label-Free Multi-Teacher Distillation Paper • 2410.01680 • Published Oct 2 • 32
EmoKnob: Enhance Voice Cloning with Fine-Grained Emotion Control Paper • 2410.00316 • Published Oct 1 • 4
VideoGuide: Improving Video Diffusion Models without Training Through a Teacher's Guide Paper • 2410.04364 • Published Oct 6 • 26
Presto! Distilling Steps and Layers for Accelerating Music Generation Paper • 2410.05167 • Published Oct 7 • 15
OmniBooth: Learning Latent Control for Image Synthesis with Multi-modal Instruction Paper • 2410.04932 • Published Oct 7 • 9
RoCoTex: A Robust Method for Consistent Texture Synthesis with Diffusion Models Paper • 2409.19989 • Published Sep 30 • 17
Redefining Temporal Modeling in Video Diffusion: The Vectorized Timestep Approach Paper • 2410.03160 • Published Oct 4 • 4
SePPO: Semi-Policy Preference Optimization for Diffusion Alignment Paper • 2410.05255 • Published Oct 7 • 4
IterComp: Iterative Composition-Aware Feedback Learning from Model Gallery for Text-to-Image Generation Paper • 2410.07171 • Published Oct 9 • 41
Pyramidal Flow Matching for Efficient Video Generative Modeling Paper • 2410.05954 • Published Oct 8 • 37
TweedieMix: Improving Multi-Concept Fusion for Diffusion-based Image/Video Generation Paper • 2410.05591 • Published Oct 8 • 13
Holistic Unlearning Benchmark: A Multi-Faceted Evaluation for Text-to-Image Diffusion Model Unlearning Paper • 2410.05664 • Published Oct 8 • 7
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching Paper • 2410.06885 • Published Oct 9 • 40
DICE: Discrete Inversion Enabling Controllable Editing for Multinomial Diffusion and Masked Generative Models Paper • 2410.08207 • Published about 1 month ago • 18
Semantic Score Distillation Sampling for Compositional Text-to-3D Generation Paper • 2410.09009 • Published 30 days ago • 13
DART: Denoising Autoregressive Transformer for Scalable Text-to-Image Generation Paper • 2410.08159 • Published about 1 month ago • 23
Rectified Diffusion: Straightness Is Not Your Need in Rectified Flow Paper • 2410.07303 • Published Oct 9 • 16
Progressive Autoregressive Video Diffusion Models Paper • 2410.08151 • Published about 1 month ago • 15
ViBiDSampler: Enhancing Video Interpolation Using Bidirectional Diffusion Sampler Paper • 2410.05651 • Published Oct 8 • 13
Animate-X: Universal Character Image Animation with Enhanced Motion Representation Paper • 2410.10306 • Published 27 days ago • 50
Cavia: Camera-controllable Multi-view Video Diffusion with View-Integrated Attention Paper • 2410.10774 • Published 27 days ago • 23
Semantic Image Inversion and Editing using Rectified Stochastic Differential Equations Paper • 2410.10792 • Published 27 days ago • 26
Generalizable Humanoid Manipulation with Improved 3D Diffusion Policies Paper • 2410.10803 • Published 27 days ago • 6
Efficient Diffusion Models: A Comprehensive Survey from Principles to Practices Paper • 2410.11795 • Published 26 days ago • 16
MarDini: Masked Autoregressive Diffusion for Video Generation at Scale Paper • 2410.20280 • Published 15 days ago • 21
Continuous Speech Synthesis using per-token Latent Diffusion Paper • 2410.16048 • Published 20 days ago • 28
FasterCache: Training-Free Video Diffusion Model Acceleration with High Quality Paper • 2410.19355 • Published 16 days ago • 20
Scaling Diffusion Language Models via Adaptation from Autoregressive Models Paper • 2410.17891 • Published 18 days ago • 15
DPLM-2: A Multimodal Diffusion Protein Language Model Paper • 2410.13782 • Published 24 days ago • 19
Diffusion Curriculum: Synthetic-to-Real Generative Curriculum Learning via Image-Guided Diffusion Paper • 2410.13674 • Published 24 days ago • 14
DimensionX: Create Any 3D and 4D Scenes from a Single Image with Controllable Video Diffusion Paper • 2411.04928 • Published 3 days ago • 32
SVDQunat: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models Paper • 2411.05007 • Published 3 days ago • 14
SG-I2V: Self-Guided Trajectory Control in Image-to-Video Generation Paper • 2411.04989 • Published 3 days ago • 12
Controlling Language and Diffusion Models by Transporting Activations Paper • 2410.23054 • Published 11 days ago • 14
DreamPolish: Domain Score Distillation With Progressive Geometry Generation Paper • 2411.01602 • Published 7 days ago • 9
MVPaint: Synchronized Multi-View Diffusion for Painting Anything 3D Paper • 2411.02336 • Published 6 days ago • 23