Jack of All Tasks, Master of Many: Designing General-purpose Coarse-to-Fine Vision-Language Model Paper • 2312.12423 • Published Dec 19, 2023 • 12
UniVTG: Towards Unified Video-Language Temporal Grounding Paper • 2307.16715 • Published Jul 31, 2023 • 10
EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone Paper • 2307.05463 • Published Jul 11, 2023 • 10