HAODONG DUAN's picture

HAODONG DUAN

KennyUTC

·

https://kennymckormick.github.io

AI & ML interests

Video Understanding; Multi-Modal Learning

Articles

Claude-3.5 Evaluation Results on Open VLM Leaderboard

RealWorldQA, What's New?

Organizations

KennyUTC's activity

upvoted a paper 17 days ago

MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models

Paper • 2410.17637 • Published 18 days ago • 34

upvoted a paper 18 days ago

PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction

Paper • 2410.17247 • Published 19 days ago • 43

upvoted 3 papers 19 days ago

PUMA: Empowering Unified MLLM with Multi-granular Visual Generation

Paper • 2410.13861 • Published 24 days ago • 53

CompassJudger-1: All-in-one Judge Model Helps Model Evaluation and Evolution

Paper • 2410.16256 • Published 20 days ago • 58

SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree

Paper • 2410.16268 • Published 20 days ago • 65

upvoted a paper 24 days ago

ProSA: Assessing and Understanding the Prompt Sensitivity of LLMs

Paper • 2410.12405 • Published 25 days ago • 13

upvoted a collection about 2 months ago

VisionLM

455 items • Updated 2 days ago • 30

upvoted a paper 2 months ago

POINTS: Improving Your Vision-language Model with Affordable Strategies

Paper • 2409.04828 • Published Sep 7 • 22

upvoted a collection 3 months ago

VILA: On Pre-training for Visual Language Models

10 items • Updated 10 days ago • 45

upvoted a paper 3 months ago

GMAI-MMBench: A Comprehensive Multimodal Evaluation Benchmark Towards General Medical AI

Paper • 2408.03361 • Published Aug 6 • 85

upvoted 2 papers 4 months ago

VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models

Paper • 2407.11691 • Published Jul 16 • 13

InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output

Paper • 2407.03320 • Published Jul 3 • 92

upvoted a collection 4 months ago

InternVL 2.0

Expanding Performance Boundaries of Open-Source MLLM • 16 items • Updated 20 days ago • 75

upvoted a paper 5 months ago

MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning

Paper • 2406.17770 • Published Jun 25 • 18

upvoted a collection 5 months ago

AI Paper of the Day

A collection of papers that I think are interesting, one added each day • 213 items • Updated about 16 hours ago • 27

upvoted an article 5 months ago

Article

Claude-3.5 Evaluation Results on Open VLM Leaderboard

By

•

Jun 24

• 5

upvoted 4 papers 5 months ago

MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video Understanding

Paper • 2406.14515 • Published Jun 20 • 32

Prism: A Framework for Decoupling and Assessing the Capabilities of VLMs

Paper • 2406.14544 • Published Jun 20 • 34

MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and Instruction-Tuning Dataset for LVLMs

Paper • 2406.11833 • Published Jun 17 • 61

ShareGPT4Video: Improving Video Understanding and Generation with Better Captions

Paper • 2406.04325 • Published Jun 6 • 71