Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners Paper • 2402.17723 • Published Feb 27 • 16
EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions Paper • 2402.17485 • Published Feb 27 • 185
Seamless Human Motion Composition with Blended Positional Encodings Paper • 2402.15509 • Published Feb 23 • 14
Divide-or-Conquer? Which Part Should You Distill Your LLM? Paper • 2402.15000 • Published Feb 22 • 22
ChatMusician: Understanding and Generating Music Intrinsically with LLM Paper • 2402.16153 • Published Feb 25 • 55
Same Task, More Tokens: the Impact of Input Length on the Reasoning Performance of Large Language Models Paper • 2402.14848 • Published Feb 19 • 18
MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases Paper • 2402.14905 • Published Feb 22 • 107
DocLLM: A layout-aware generative language model for multimodal document understanding Paper • 2401.00908 • Published Dec 31, 2023 • 178