Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Models Paper • 2406.09403 • Published Jun 13 • 19
BLINK: Multimodal Large Language Models Can See but Not Perceive Paper • 2404.12390 • Published Apr 18 • 24
PromptCap: Prompt-Guided Image Captioning for VQA with GPT-3 Paper • 2211.09699 • Published Nov 15, 2022 • 2
Fine-Grained Human Feedback Gives Better Rewards for Language Model Training Paper • 2306.01693 • Published Jun 2, 2023 • 3