dandelin
/

vilt-b32-finetuned-vqa

Visual Question Answering

Inference Endpoints

Model card Files Files and versions Community

nielsr HF staff commited on Nov 27, 2021

Commit

c6b2049

•

1 Parent(s): ad5d7e3

Add link

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -7,7 +7,7 @@ datasets:
 # Vision-and-Language Transformer (ViLT), fine-tuned on VQAv2
-Vision-and-Language Transformer (ViLT) model fine-tuned on [VQAv2](). It was introduced in the paper [ViLT: Vision-and-Language Transformer
 Without Convolution or Region Supervision](https://arxiv.org/abs/2102.03334) by Kim et al. and first released in [this repository](https://github.com/dandelin/ViLT).
 Disclaimer: The team releasing ViLT did not write a model card for this model so this model card has been written by the Hugging Face team.

 # Vision-and-Language Transformer (ViLT), fine-tuned on VQAv2
+Vision-and-Language Transformer (ViLT) model fine-tuned on [VQAv2](https://visualqa.org/). It was introduced in the paper [ViLT: Vision-and-Language Transformer
 Without Convolution or Region Supervision](https://arxiv.org/abs/2102.03334) by Kim et al. and first released in [this repository](https://github.com/dandelin/ViLT).
 Disclaimer: The team releasing ViLT did not write a model card for this model so this model card has been written by the Hugging Face team.