pvduy commited on
Commit
f74bac7
1 Parent(s): bc48eff

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +42 -2
README.md CHANGED
@@ -104,7 +104,7 @@ The dataset is comprised of a mixture of open datasets large-scale datasets avai
104
  | ARC (25-shot) | 47.0 |
105
  | HellaSwag (10-shot) | 74.2 |
106
  | MMLU (5-shot) | 46.3 |
107
- | TruthfulQA (0-shot) | 46.43 |
108
  | Winogrande (5-shot) | 65.5 |
109
  | GSM8K (5-shot) | 42.3 |
110
 
@@ -112,7 +112,7 @@ The dataset is comprised of a mixture of open datasets large-scale datasets avai
112
  2. BigBench:
113
 
114
  - Average: 35.26
115
- - Details:
116
 
117
  | Task | Version | Metric | Value | Stderr |
118
  |-----------------------------------------------------|---------|-------------------------|-------|--------|
@@ -138,6 +138,46 @@ The dataset is comprised of a mixture of open datasets large-scale datasets avai
138
  | bigbench_tracking_shuffled_objects_seven_objects | 0 | multiple_choice_grade | 0.1856| 0.0110 |
139
  | bigbench_tracking_shuffled_objects_three_objects | 0 | multiple_choice_grade | 0.1269| 0.0080 |
140
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
141
  ### Training Infrastructure
142
 
143
  * **Hardware**: `Stable Zephyr 3B` was trained on the Stability AI cluster across 8 nodes with 8 A100 80GBs GPUs for each nodes.
 
104
  | ARC (25-shot) | 47.0 |
105
  | HellaSwag (10-shot) | 74.2 |
106
  | MMLU (5-shot) | 46.3 |
107
+ | TruthfulQA (0-shot) | 46.5 |
108
  | Winogrande (5-shot) | 65.5 |
109
  | GSM8K (5-shot) | 42.3 |
110
 
 
112
  2. BigBench:
113
 
114
  - Average: 35.26
115
+ - Details:
116
 
117
  | Task | Version | Metric | Value | Stderr |
118
  |-----------------------------------------------------|---------|-------------------------|-------|--------|
 
138
  | bigbench_tracking_shuffled_objects_seven_objects | 0 | multiple_choice_grade | 0.1856| 0.0110 |
139
  | bigbench_tracking_shuffled_objects_three_objects | 0 | multiple_choice_grade | 0.1269| 0.0080 |
140
 
141
+ 3. AGI:
142
+ - Average: 33.23
143
+ - Details:
144
+ | Task |Version| Metric |Value | |Stderr|
145
+ |------------------------------|------:|--------|-----:|---|-----:|
146
+ |agieval_aqua_rat | 0|acc |0.2126|± |0.0257|
147
+ | | |acc_norm|0.1890|± |0.0246|
148
+ |agieval_gaokao_biology | 0|acc |0.2571|± |0.0302|
149
+ | | |acc_norm|0.3143|± |0.0321|
150
+ |agieval_gaokao_chemistry | 0|acc |0.2464|± |0.0300|
151
+ | | |acc_norm|0.2899|± |0.0316|
152
+ |agieval_gaokao_chinese | 0|acc |0.2927|± |0.0291|
153
+ | | |acc_norm|0.3049|± |0.0294|
154
+ |agieval_gaokao_english | 0|acc |0.6176|± |0.0278|
155
+ | | |acc_norm|0.6438|± |0.0274|
156
+ |agieval_gaokao_geography | 0|acc |0.3015|± |0.0326|
157
+ | | |acc_norm|0.3065|± |0.0328|
158
+ |agieval_gaokao_history | 0|acc |0.3106|± |0.0303|
159
+ | | |acc_norm|0.3319|± |0.0308|
160
+ |agieval_gaokao_mathqa | 0|acc |0.2650|± |0.0236|
161
+ | | |acc_norm|0.2707|± |0.0237|
162
+ |agieval_gaokao_physics | 0|acc |0.3450|± |0.0337|
163
+ | | |acc_norm|0.3550|± |0.0339|
164
+ |agieval_logiqa_en | 0|acc |0.2980|± |0.0179|
165
+ | | |acc_norm|0.3195|± |0.0183|
166
+ |agieval_logiqa_zh | 0|acc |0.2842|± |0.0177|
167
+ | | |acc_norm|0.3318|± |0.0185|
168
+ |agieval_lsat_ar | 0|acc |0.2000|± |0.0264|
169
+ | | |acc_norm|0.2043|± |0.0266|
170
+ |agieval_lsat_lr | 0|acc |0.3176|± |0.0206|
171
+ | | |acc_norm|0.3275|± |0.0208|
172
+ |agieval_lsat_rc | 0|acc |0.4312|± |0.0303|
173
+ | | |acc_norm|0.4201|± |0.0301|
174
+ |agieval_sat_en | 0|acc |0.6117|± |0.0340|
175
+ | | |acc_norm|0.6117|± |0.0340|
176
+ |agieval_sat_en_without_passage| 0|acc |0.3398|± |0.0331|
177
+ | | |acc_norm|0.3495|± |0.0333|
178
+ |agieval_sat_math | 0|acc |0.3182|± |0.0315|
179
+ | | |acc_norm|0.2909|± |0.0307|
180
+
181
  ### Training Infrastructure
182
 
183
  * **Hardware**: `Stable Zephyr 3B` was trained on the Stability AI cluster across 8 nodes with 8 A100 80GBs GPUs for each nodes.