LinaAlhuri
commited on
Commit
•
4c25d9e
1
Parent(s):
cba2ced
Update README.md
Browse files
README.md
CHANGED
@@ -27,8 +27,7 @@ tokenizer = AutoTokenizer.from_pretrained("asafaya/bert-base-arabic", cache_dir=
|
|
27 |
|
28 |
## Data
|
29 |
|
30 |
-
|
31 |
-
|
32 |
|
33 |
| Dataset name | Images |
|
34 |
| --- | --- |
|
|
|
27 |
|
28 |
## Data
|
29 |
|
30 |
+
The aim was to create a comprehensive Arabic image-text dataset by combining various data sources due to the scarcity of Arabic resources. Challenges included limited Arabic data and the quality of translated datasets. The approach involved merging genuine datasets for rich information and using translated datasets to cover diverse domains, scenarios, and objects, striking a balance between their respective pros and cons.
|
|
|
31 |
|
32 |
| Dataset name | Images |
|
33 |
| --- | --- |
|