SarwarShafee
commited on
Commit
•
8a21c53
1
Parent(s):
b922d97
Update README.md
Browse files
README.md
CHANGED
@@ -119,7 +119,7 @@ We evaluated the models on the following datasets:
|
|
119 |
#### Evaluation on English Benchmark datasets
|
120 |
- **llama-3.2-3b** consistently outperforms **titulm-llama-3.2-3b-v2.0** across all English tasks. It achieves high scores, particularly in **MMLU**, **BoolQ**, and **Commonsense QA**, with a maximum score of 0.80 on **PIQA** in the 5-shot setting.
|
121 |
- In contrast, **titulm-llama-3.2-3b-v2.0** underperforms on all English benchmarks, scoring much lower than the base model, especially in **Commonsense QA** and **PIQA**, with only minor improvements between 0-shot and 5-shot.
|
122 |
-
- It was expected as the model trained only on Bangla datasets.
|
123 |
|
124 |
| Model | Shots | MMLU | BoolQ | Commonsense QA | OpenBook QA | PIQA |
|
125 |
|-------------------------------|---------|-------------|--------|----------------|-------------|-------|
|
|
|
119 |
#### Evaluation on English Benchmark datasets
|
120 |
- **llama-3.2-3b** consistently outperforms **titulm-llama-3.2-3b-v2.0** across all English tasks. It achieves high scores, particularly in **MMLU**, **BoolQ**, and **Commonsense QA**, with a maximum score of 0.80 on **PIQA** in the 5-shot setting.
|
121 |
- In contrast, **titulm-llama-3.2-3b-v2.0** underperforms on all English benchmarks, scoring much lower than the base model, especially in **Commonsense QA** and **PIQA**, with only minor improvements between 0-shot and 5-shot.
|
122 |
+
- It was expected as the model was trained only on Bangla datasets.
|
123 |
|
124 |
| Model | Shots | MMLU | BoolQ | Commonsense QA | OpenBook QA | PIQA |
|
125 |
|-------------------------------|---------|-------------|--------|----------------|-------------|-------|
|