Models
Datasets
Spaces
Posts
Docs
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2304.14178

Papers - mPlug-Owl

mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality

Paper • 2304.14178 • Published Apr 27, 2023 • 2
mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding

Paper • 2403.12895 • Published Mar 19 • 30

Papers - Multimodal - Encoders

mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality

Paper • 2304.14178 • Published Apr 27, 2023 • 2
HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Models

Paper • 2403.13447 • Published Mar 20 • 18

Papers - Image - Understanding

Veagle: Advancements in Multimodal Representation Learning

Paper • 2403.08773 • Published Jan 18 • 7
mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality

Paper • 2304.14178 • Published Apr 27, 2023 • 2
Chart-based Reasoning: Transferring Capabilities from LLMs to VLMs

Paper • 2403.12596 • Published Mar 19 • 9
LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution Images

Paper • 2403.11703 • Published Mar 18 • 16

Papers - Image - Encoders

CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows

Paper • 2107.00652 • Published Jul 1, 2021 • 2
Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering

Paper • 2403.09622 • Published Mar 14 • 16
Veagle: Advancements in Multimodal Representation Learning

Paper • 2403.08773 • Published Jan 18 • 7
mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality

Paper • 2304.14178 • Published Apr 27, 2023 • 2

Papers - Multimodal

TinyLLaVA: A Framework of Small-scale Large Multimodal Models

Paper • 2402.14289 • Published Feb 22 • 19
ImageBind: One Embedding Space To Bind Them All

Paper • 2305.05665 • Published May 9, 2023 • 3
DocLLM: A layout-aware generative language model for multimodal document understanding

Paper • 2401.00908 • Published Dec 31, 2023 • 181
Multimodal Contrastive Learning with LIMoE: the Language-Image Mixture of Experts

Paper • 2206.02770 • Published Jun 6, 2022 • 3

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs