Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming Paper • 2408.16725 • Published Aug 29 • 52
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model Paper • 2408.11039 • Published Aug 20 • 56
PSLM: Parallel Generation of Text and Speech with LLMs for Low-Latency Spoken Dialogue Systems Paper • 2406.12428 • Published Jun 18 • 1
Audio Dialogues: Dialogues dataset for audio and music understanding Paper • 2404.07616 • Published Apr 11 • 15
DeepSeek-VL: Towards Real-World Vision-Language Understanding Paper • 2403.05525 • Published Mar 8 • 39
MoMA: Multimodal LLM Adapter for Fast Personalized Image Generation Paper • 2404.05674 • Published Apr 8 • 13