The announcements for this model excitingly highlighted vision-to-text capabilities, but its not clear from any of the documents I can find how to leverage this. Are there any VQA examples someone could share?
· Sign up or log in to comment