Raincleared
commited on
Commit
•
3935c6b
1
Parent(s):
e030238
Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
@@ -82,11 +82,13 @@ The evaluation results on the above benchmarks demonstrate the advantage of ProS
|
|
82 |
|
83 |
- **Code Generation**: We compute the average pass@1 scores on HumanEval (0-shot) and MBPP (3-shot).
|
84 |
|
85 |
-
- **Commonsense Reasoning**: We report the average 0-shot
|
86 |
|
87 |
-
- **Reading Comprehension**: We compute the average 0-shot
|
88 |
|
89 |
-
- **Other Popular Benchmarks**: We report the average accuracies on GSM8K (8-shot), MMLU (5-shot), Big Bench Hard (BBH) (3-shot), and
|
|
|
|
|
90 |
|
91 |
| Setting | Average<br>Sparsity | Code<br>Generation | Commonsense<br>Reasoning | Reading<br>Comprehension | GSM8K | MMLU | BBH | AGI Eval | Average |
|
92 |
| :-------------------: | :-----------------: | :----------------: | :----------------------: | :----------------------: | :---: | :---: | :---: | :---------: | :-----: |
|
|
|
82 |
|
83 |
- **Code Generation**: We compute the average pass@1 scores on HumanEval (0-shot) and MBPP (3-shot).
|
84 |
|
85 |
+
- **Commonsense Reasoning**: We report the average 0-shot accuracies on PIQA, SIQA, HellaSwag, WinoGrande, and COPA.
|
86 |
|
87 |
+
- **Reading Comprehension**: We compute the average 0-shot accuracies on BoolQ, 0-shot accuracy on LAMBADA and TyDi QA.
|
88 |
|
89 |
+
- **Other Popular Benchmarks**: We report the average accuracies on GSM8K (8-shot), MMLU (5-shot), Big Bench Hard (BBH) (3-shot), and AGI-Eval (0-shot). Refer to Appendix~\ref{sec:eval-details} for more details.
|
90 |
+
|
91 |
+
Note: For PIQA, SIQA, HellaSwag, WinoGrande, COPA, BoolQ, LAMBADA, TyDi QA, and AGI-Eval, we obtain the predicted answers based on maximized perplexity. For GSM8K, MMLU, and BBH, the predicted answers are directly generated.
|
92 |
|
93 |
| Setting | Average<br>Sparsity | Code<br>Generation | Commonsense<br>Reasoning | Reading<br>Comprehension | GSM8K | MMLU | BBH | AGI Eval | Average |
|
94 |
| :-------------------: | :-----------------: | :----------------: | :----------------------: | :----------------------: | :---: | :---: | :---: | :---------: | :-----: |
|