ROCKET-1: Master Open-World Interaction with Visual-Temporal Context Prompting Paper • 2410.17856 • Published 21 days ago • 48
CogVLM2: Visual Language Models for Image and Video Understanding Paper • 2408.16500 • Published Aug 29 • 56
LAVE: LLM-Powered Agent Assistance and Language Augmentation for Video Editing Paper • 2402.10294 • Published Feb 15 • 22
PhysDreamer: Physics-Based Interaction with 3D Objects via Video Generation Paper • 2404.13026 • Published Apr 19 • 23
Magic123: One Image to High-Quality 3D Object Generation Using Both 2D and 3D Diffusion Priors Paper • 2306.17843 • Published Jun 30, 2023 • 43
Agents: An Open-source Framework for Autonomous Language Agents Paper • 2309.07870 • Published Sep 14, 2023 • 42