Commit
83ac4b2
1 Parent(s): 253ec84

Adding Evaluation Results (#5)

Browse files

- Adding Evaluation Results (1978e0ec13872cb0848162d51069431ae7268a90)


Co-authored-by: Open LLM Leaderboard PR Bot <[email protected]>

Files changed (1) hide show
  1. README.md +106 -0
README.md CHANGED
@@ -117,6 +117,98 @@ model-index:
117
  source:
118
  url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=psmathur/model_007_13b_v2
119
  name: Open LLM Leaderboard
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
120
  ---
121
 
122
  # model_007_13b_v2
@@ -330,3 +422,17 @@ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-le
330
  |Winogrande (5-shot) |75.85|
331
  |GSM8k (5-shot) | 1.36|
332
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
117
  source:
118
  url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=psmathur/model_007_13b_v2
119
  name: Open LLM Leaderboard
120
+ - task:
121
+ type: text-generation
122
+ name: Text Generation
123
+ dataset:
124
+ name: IFEval (0-Shot)
125
+ type: HuggingFaceH4/ifeval
126
+ args:
127
+ num_few_shot: 0
128
+ metrics:
129
+ - type: inst_level_strict_acc and prompt_level_strict_acc
130
+ value: 30.56
131
+ name: strict accuracy
132
+ source:
133
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=pankajmathur/model_007_13b_v2
134
+ name: Open LLM Leaderboard
135
+ - task:
136
+ type: text-generation
137
+ name: Text Generation
138
+ dataset:
139
+ name: BBH (3-Shot)
140
+ type: BBH
141
+ args:
142
+ num_few_shot: 3
143
+ metrics:
144
+ - type: acc_norm
145
+ value: 25.45
146
+ name: normalized accuracy
147
+ source:
148
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=pankajmathur/model_007_13b_v2
149
+ name: Open LLM Leaderboard
150
+ - task:
151
+ type: text-generation
152
+ name: Text Generation
153
+ dataset:
154
+ name: MATH Lvl 5 (4-Shot)
155
+ type: hendrycks/competition_math
156
+ args:
157
+ num_few_shot: 4
158
+ metrics:
159
+ - type: exact_match
160
+ value: 1.21
161
+ name: exact match
162
+ source:
163
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=pankajmathur/model_007_13b_v2
164
+ name: Open LLM Leaderboard
165
+ - task:
166
+ type: text-generation
167
+ name: Text Generation
168
+ dataset:
169
+ name: GPQA (0-shot)
170
+ type: Idavidrein/gpqa
171
+ args:
172
+ num_few_shot: 0
173
+ metrics:
174
+ - type: acc_norm
175
+ value: 4.47
176
+ name: acc_norm
177
+ source:
178
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=pankajmathur/model_007_13b_v2
179
+ name: Open LLM Leaderboard
180
+ - task:
181
+ type: text-generation
182
+ name: Text Generation
183
+ dataset:
184
+ name: MuSR (0-shot)
185
+ type: TAUR-Lab/MuSR
186
+ args:
187
+ num_few_shot: 0
188
+ metrics:
189
+ - type: acc_norm
190
+ value: 17.2
191
+ name: acc_norm
192
+ source:
193
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=pankajmathur/model_007_13b_v2
194
+ name: Open LLM Leaderboard
195
+ - task:
196
+ type: text-generation
197
+ name: Text Generation
198
+ dataset:
199
+ name: MMLU-PRO (5-shot)
200
+ type: TIGER-Lab/MMLU-Pro
201
+ config: main
202
+ split: test
203
+ args:
204
+ num_few_shot: 5
205
+ metrics:
206
+ - type: acc
207
+ value: 16.23
208
+ name: accuracy
209
+ source:
210
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=pankajmathur/model_007_13b_v2
211
+ name: Open LLM Leaderboard
212
  ---
213
 
214
  # model_007_13b_v2
 
422
  |Winogrande (5-shot) |75.85|
423
  |GSM8k (5-shot) | 1.36|
424
 
425
+
426
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
427
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_pankajmathur__model_007_13b_v2)
428
+
429
+ | Metric |Value|
430
+ |-------------------|----:|
431
+ |Avg. |15.86|
432
+ |IFEval (0-Shot) |30.56|
433
+ |BBH (3-Shot) |25.45|
434
+ |MATH Lvl 5 (4-Shot)| 1.21|
435
+ |GPQA (0-shot) | 4.47|
436
+ |MuSR (0-shot) |17.20|
437
+ |MMLU-PRO (5-shot) |16.23|
438
+