-
Zero-Shot and Few-Shot Video Question Answering with Multi-Modal Prompts
Paper • 2309.15915 • Published • 2 -
Reformulating Vision-Language Foundation Models and Datasets Towards Universal Multimodal Assistants
Paper • 2310.00653 • Published • 3 -
Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities
Paper • 2308.12966 • Published • 7 -
An Empirical Study of Scaling Instruct-Tuned Large Multimodal Models
Paper • 2309.09958 • Published • 18
Zhao
Hanyu66
AI & ML interests
CV, NLP
Organizations
None yet
Collections
1
models
1
datasets
None public yet