Pretergeek
's Collections
(Papers) Multimodal
updated
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Paper
•
2311.05437
•
Published
•
47
Visual Instruction Tuning
Paper
•
2304.08485
•
Published
•
13
Improved Baselines with Visual Instruction Tuning
Paper
•
2310.03744
•
Published
•
37
Making Large Multimodal Models Understand Arbitrary Visual Prompts
Paper
•
2312.00784
•
Published
•
2
LLaVA-OneVision: Easy Visual Task Transfer
Paper
•
2408.03326
•
Published
•
59
Unveiling Encoder-Free Vision-Language Models
Paper
•
2406.11832
•
Published
•
49
LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation,
Generation and Editing
Paper
•
2311.00571
•
Published
•
40
Paper
•
2410.21276
•
Published
•
78
Mini-Omni2: Towards Open-source GPT-4o with Vision, Speech and Duplex
Capabilities
Paper
•
2410.11190
•
Published
•
20
Paper
•
2410.07073
•
Published
•
60
LLaVA-3D: A Simple yet Effective Pathway to Empowering LMMs with
3D-awareness
Paper
•
2409.18125
•
Published
•
33
LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via
Hybrid Architecture
Paper
•
2409.02889
•
Published
•
54
LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge Distillation
Paper
•
2408.15881
•
Published
•
20
LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large
Multimodal Models
Paper
•
2407.07895
•
Published
•
40
An Introduction to Vision-Language Modeling
Paper
•
2405.17247
•
Published
•
85
What matters when building vision-language models?
Paper
•
2405.02246
•
Published
•
98
TinyLLaVA: A Framework of Small-scale Large Multimodal Models
Paper
•
2402.14289
•
Published
•
19
LEGO:Language Enhanced Multi-modal Grounding Model
Paper
•
2401.06071
•
Published
•
10
LLaVA-φ: Efficient Multi-Modal Assistant with Small Language Model
Paper
•
2401.02330
•
Published
•
14