migtissera leaderboard-pr-bot commited on
Commit
049b6e2
1 Parent(s): c87658a

Adding Evaluation Results (#2)

Browse files

- Adding Evaluation Results (0d618fa89706dc7e329e6117fff25efa8a3ea031)


Co-authored-by: Open LLM Leaderboard PR Bot <[email protected]>

Files changed (1) hide show
  1. README.md +14 -1
README.md CHANGED
@@ -260,4 +260,17 @@ ientific experiments to study the effects of solar wind and other charged partic
260
  The use of magnetic fields to protect spacecraft and humans from the effects of radiation is an ongoing area of research and development, and there are a number of proposals and concepts for how this might be ac
261
  hieved. For example, some researchers have proposed using superconducting coils to create a magnetic field around the spacecraft, which could help deflect charged particles and other forms of radiation. However,
262
  these proposals are still in the early stages of development and are not yet ready for implementation on spacecraft.
263
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
260
  The use of magnetic fields to protect spacecraft and humans from the effects of radiation is an ongoing area of research and development, and there are a number of proposals and concepts for how this might be ac
261
  hieved. For example, some researchers have proposed using superconducting coils to create a magnetic field around the spacecraft, which could help deflect charged particles and other forms of radiation. However,
262
  these proposals are still in the early stages of development and are not yet ready for implementation on spacecraft.
263
+ ```
264
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
265
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_migtissera__Synthia-70B-v1.1)
266
+
267
+ | Metric | Value |
268
+ |-----------------------|---------------------------|
269
+ | Avg. | 62.84 |
270
+ | ARC (25-shot) | 70.05 |
271
+ | HellaSwag (10-shot) | 87.12 |
272
+ | MMLU (5-shot) | 70.34 |
273
+ | TruthfulQA (0-shot) | 57.84 |
274
+ | Winogrande (5-shot) | 83.66 |
275
+ | GSM8K (5-shot) | 31.84 |
276
+ | DROP (3-shot) | 39.02 |