Collections
Discover the best community collections!
Collections including paper arxiv:2304.14178
-
Veagle: Advancements in Multimodal Representation Learning
Paper • 2403.08773 • Published • 7 -
mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality
Paper • 2304.14178 • Published • 2 -
Chart-based Reasoning: Transferring Capabilities from LLMs to VLMs
Paper • 2403.12596 • Published • 9 -
LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution Images
Paper • 2403.11703 • Published • 16
-
CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows
Paper • 2107.00652 • Published • 2 -
Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering
Paper • 2403.09622 • Published • 16 -
Veagle: Advancements in Multimodal Representation Learning
Paper • 2403.08773 • Published • 7 -
mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality
Paper • 2304.14178 • Published • 2
-
TinyLLaVA: A Framework of Small-scale Large Multimodal Models
Paper • 2402.14289 • Published • 19 -
ImageBind: One Embedding Space To Bind Them All
Paper • 2305.05665 • Published • 3 -
DocLLM: A layout-aware generative language model for multimodal document understanding
Paper • 2401.00908 • Published • 181 -
Multimodal Contrastive Learning with LIMoE: the Language-Image Mixture of Experts
Paper • 2206.02770 • Published • 3