Abstract
The development of video large multimodal models (LMMs) has been hindered by the difficulty of curating large amounts of high-quality raw data from the web. To address this, we propose an alternative approach by creating a high-quality synthetic dataset specifically for video instruction-following, namely LLaVA-Video-178K. This dataset includes key tasks such as detailed captioning, open-ended question-answering (QA), and multiple-choice QA. By training on this dataset, in combination with existing visual instruction tuning data, we introduce LLaVA-Video, a new video LMM. Our experiments demonstrate that LLaVA-Video achieves strong performance across various video benchmarks, highlighting the effectiveness of our dataset. We plan to release the dataset, its generation pipeline, and the model checkpoints.
Community
We propose a high-quality synthetic dataset specifically for video instruction-following, namely LLaVA-Video-178K. This dataset includes key tasks such as detailed captioning, open-ended question-answering (QA), and multiple-choice QA.
We introduce LLaVA-Video, a new video LMM. Our experiments demonstrate that LLaVA-Video achieves strong performance across various video benchmarks, highlighting the effectiveness of our dataset.
Project page: https://llava-vl.github.io/blog/2024-09-30-llava-video/
HuggingFace Demo
Demo: https://huggingface.co/spaces/Tonic/Llava-Video
Credit: https://x.com/josephpollack/status/1842253368749678921
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Kangaroo: A Powerful Video-Language Model Supporting Long-context Video Input (2024)
- Visual Context Window Extension: A New Perspective for Long Video Understanding (2024)
- Interpolating Video-LLMs: Toward Longer-sequence LMMs in a Training-free Manner (2024)
- VideoLLaMB: Long-context Video Understanding with Recurrent Memory Bridges (2024)
- Question-Answering Dense Video Events (2024)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend