LinaAlhuri
/

Arabic-clip-bert-lit

vision-text-dual-encoder

feature-extraction

Inference Endpoints

Model card Files Files and versions Metrics Training metrics Community

LinaAlhuri commited on Nov 14, 2023

Commit

4c25d9e

•

1 Parent(s): cba2ced

Update README.md

Files changed (1) hide show

README.md +1 -2

README.md CHANGED Viewed

@@ -27,8 +27,7 @@ tokenizer = AutoTokenizer.from_pretrained("asafaya/bert-base-arabic", cache_dir=
 ## Data
-This was done through a combination of crawling Wikipedia and using commonly used pre-existing image datasets such as [CC](https://ai.google.com/research/ConceptualCaptions/). One of the most challenging obstacles for multimodal technologies is the fact that Arabic has few data resources, making huge dataset construction difficult. Another is the degradation of translated datasets adapted from well-known publicly available datasets. Whether the choice is to use translated data or genuine data, it is difficult to achieve the desired results depending on only one source, as each choice has its pros and cons. As a result, the goal of this work is to construct the largest Arabic image-text pair collection feasible by merging diverse data sources. This technique takes advantage of the rich information in genuine datasets to compensate for information loss in translated datasets. In contrast, translated datasets contribute to this work with enough pairs that cover a wide range of domains, scenarios, and objects.
 | Dataset name | Images   |
 | --- | --- |

 ## Data
+The aim was to create a comprehensive Arabic image-text dataset by combining various data sources due to the scarcity of Arabic resources. Challenges included limited Arabic data and the quality of translated datasets. The approach involved merging genuine datasets for rich information and using translated datasets to cover diverse domains, scenarios, and objects, striking a balance between their respective pros and cons.
 | Dataset name | Images   |
 | --- | --- |