Update README.md
Browse files
README.md
CHANGED
@@ -44,7 +44,7 @@ Here is the performance of this model across benchmarks explored in our paper [H
|
|
44 |
|
45 |
| MMLU 0-shot | MMLU 5-shot | GSM Direct | GSM CoT | BBH Direct | BBH CoT | TydiQA Gold-Passage | TydiQA Closed-book | Codex-Eval Pass@1 | Codex-Eval Pass@10 | AlpacaFarm vs Davinci-003 | Average |
|
46 |
|:-----------:|:-----------:|:----------:|:-------:|:----------:|:-------:|:-------------------:|:------------------:|:-----------------:|:------------------:|:-------------------------:|---------|
|
47 |
-
|
|
48 |
|
49 |
If you use this model, please cite our work, the llama paper, and the original dataset:
|
50 |
|
|
|
44 |
|
45 |
| MMLU 0-shot | MMLU 5-shot | GSM Direct | GSM CoT | BBH Direct | BBH CoT | TydiQA Gold-Passage | TydiQA Closed-book | Codex-Eval Pass@1 | Codex-Eval Pass@10 | AlpacaFarm vs Davinci-003 | Average |
|
46 |
|:-----------:|:-----------:|:----------:|:-------:|:----------:|:-------:|:-------------------:|:------------------:|:-----------------:|:------------------:|:-------------------------:|---------|
|
47 |
+
| 49.8 | 50.8 | 2.5 | 4.0 | 38.3 | 2.8 | 51.4 | 10.4 | 8.2 | 13.1 | 6.2 | 20.3 |
|
48 |
|
49 |
If you use this model, please cite our work, the llama paper, and the original dataset:
|
50 |
|