hishab
/

titulm-llama-3.2-3b-v2.0

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

SarwarShafee commited on 21 days ago

Commit

8a21c53

•

1 Parent(s): b922d97

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -119,7 +119,7 @@ We evaluated the models on the following datasets:
 #### Evaluation on English Benchmark datasets
 - **llama-3.2-3b** consistently outperforms **titulm-llama-3.2-3b-v2.0** across all English tasks. It achieves high scores, particularly in **MMLU**, **BoolQ**, and **Commonsense QA**, with a maximum score of 0.80 on **PIQA** in the 5-shot setting.
 - In contrast, **titulm-llama-3.2-3b-v2.0** underperforms on all English benchmarks, scoring much lower than the base model, especially in **Commonsense QA** and **PIQA**, with only minor improvements between 0-shot and 5-shot.
-- It was expected as the model trained only on Bangla datasets.
 | Model                         | Shots   | MMLU        | BoolQ  | Commonsense QA | OpenBook QA | PIQA  |
 |-------------------------------|---------|-------------|--------|----------------|-------------|-------|

 #### Evaluation on English Benchmark datasets
 - **llama-3.2-3b** consistently outperforms **titulm-llama-3.2-3b-v2.0** across all English tasks. It achieves high scores, particularly in **MMLU**, **BoolQ**, and **Commonsense QA**, with a maximum score of 0.80 on **PIQA** in the 5-shot setting.
 - In contrast, **titulm-llama-3.2-3b-v2.0** underperforms on all English benchmarks, scoring much lower than the base model, especially in **Commonsense QA** and **PIQA**, with only minor improvements between 0-shot and 5-shot.
+- It was expected as the model was trained only on Bangla datasets.
 | Model                         | Shots   | MMLU        | BoolQ  | Commonsense QA | OpenBook QA | PIQA  |
 |-------------------------------|---------|-------------|--------|----------------|-------------|-------|