Instant Facial Gaussians Translator for Relightable and Interactable Facial Rendering Paper • 2409.07441 • Published 9 days ago • 9
SUPER: Evaluating Agents on Setting Up and Executing Tasks from Research Repositories Paper • 2409.07440 • Published 9 days ago • 6
Hi3D: Pursuing High-Resolution Image-to-3D Generation with Video Diffusion Models Paper • 2409.07452 • Published 9 days ago • 18
From MOOC to MAIC: Reshaping Online Teaching and Learning through LLM-driven Agents Paper • 2409.03512 • Published 15 days ago • 25
GenAgent: Build Collaborative AI Systems with Automated Workflow Generation -- Case Studies on ComfyUI Paper • 2409.01392 • Published 18 days ago • 9
Efficient Detection of Toxic Prompts in Large Language Models Paper • 2408.11727 • Published 30 days ago • 11
HeadGAP: Few-shot 3D Head Avatar via Generalizable Gaussian Priors Paper • 2408.06019 • Published Aug 12 • 12
VGGHeads: A Large-Scale Synthetic Dataset for 3D Human Heads Paper • 2407.18245 • Published Jul 25 • 7
Towards Building Specialized Generalist AI with System 1 and System 2 Fusion Paper • 2407.08642 • Published Jul 11 • 9
LongRAG: Enhancing Retrieval-Augmented Generation with Long-context LLMs Paper • 2406.15319 • Published Jun 21 • 60
DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning Paper • 2406.11896 • Published Jun 14 • 18
The Devil is in the Details: StyleFeatureEditor for Detail-Rich StyleGAN Inversion and High Quality Image Editing Paper • 2406.10601 • Published Jun 15 • 65
HumanSplat: Generalizable Single-Image Human Gaussian Splatting with Structure Priors Paper • 2406.12459 • Published Jun 18 • 11
Lighting Every Darkness with 3DGS: Fast Training and Real-Time Rendering for HDR View Synthesis Paper • 2406.06216 • Published Jun 10 • 17
Physics3D: Learning Physical Properties of 3D Gaussians via Video Diffusion Paper • 2406.04338 • Published Jun 6 • 34
FIFO-Diffusion: Generating Infinite Videos from Text without Training Paper • 2405.11473 • Published May 19 • 53
AutoCrawler: A Progressive Understanding Web Agent for Web Crawler Generation Paper • 2404.12753 • Published Apr 19 • 41
SwapAnything: Enabling Arbitrary Object Swapping in Personalized Visual Editing Paper • 2404.05717 • Published Apr 8 • 24
ByteEdit: Boost, Comply and Accelerate Generative Image Editing Paper • 2404.04860 • Published Apr 7 • 24
CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching Paper • 2404.03653 • Published Apr 4 • 32
InstantStyle: Free Lunch towards Style-Preserving in Text-to-Image Generation Paper • 2404.02733 • Published Apr 3 • 20
ObjectDrop: Bootstrapping Counterfactuals for Photorealistic Object Removal and Insertion Paper • 2403.18818 • Published Mar 27 • 24
Garment3DGen: 3D Garment Stylization and Texture Generation Paper • 2403.18816 • Published Mar 27 • 20
Be Yourself: Bounded Attention for Multi-Subject Text-to-Image Generation Paper • 2403.16990 • Published Mar 25 • 24
FlashFace: Human Image Personalization with High-fidelity Identity Preservation Paper • 2403.17008 • Published Mar 25 • 18
SDXS: Real-Time One-Step Latent Diffusion Models with Image Conditions Paper • 2403.16627 • Published Mar 25 • 20
DragAnything: Motion Control for Anything using Entity Representation Paper • 2403.07420 • Published Mar 12 • 12
Design2Code: How Far Are We From Automating Front-End Engineering? Paper • 2403.03163 • Published Mar 5 • 93
OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable Virtual Try-on Paper • 2403.01779 • Published Mar 4 • 26
MOSAIC: A Modular System for Assistive and Interactive Cooking Paper • 2402.18796 • Published Feb 29 • 23
MobiLlama: Towards Accurate and Lightweight Fully Transparent GPT Paper • 2402.16840 • Published Feb 26 • 23
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits Paper • 2402.17764 • Published Feb 27 • 590
EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions Paper • 2402.17485 • Published Feb 27 • 185
Music Style Transfer with Time-Varying Inversion of Diffusion Models Paper • 2402.13763 • Published Feb 21 • 9
YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information Paper • 2402.13616 • Published Feb 21 • 45
How Easy is It to Fool Your Multimodal LLMs? An Empirical Analysis on Deceptive Prompts Paper • 2402.13220 • Published Feb 20 • 12
Pushing Auto-regressive Models for 3D Shape Generation at Capacity and Scalability Paper • 2402.12225 • Published Feb 19 • 5
Universal Manipulation Interface: In-The-Wild Robot Teaching Without In-The-Wild Robots Paper • 2402.10329 • Published Feb 15 • 13
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss Paper • 2402.05008 • Published Feb 7 • 19
LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation Paper • 2402.05054 • Published Feb 7 • 25
ConsistI2V: Enhancing Visual Consistency for Image-to-Video Generation Paper • 2402.04324 • Published Feb 6 • 23
Anything in Any Scene: Photorealistic Video Object Insertion Paper • 2401.17509 • Published Jan 30 • 16