leaderboard-pr-bot commited on
Commit
0d618fa
1 Parent(s): c87658a

Adding Evaluation Results

Browse files

This is an automated PR created with https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr

The purpose of this PR is to add evaluation results from the Open LLM Leaderboard to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr/discussions

Files changed (1) hide show
  1. README.md +14 -1
README.md CHANGED
@@ -260,4 +260,17 @@ ientific experiments to study the effects of solar wind and other charged partic
260
  The use of magnetic fields to protect spacecraft and humans from the effects of radiation is an ongoing area of research and development, and there are a number of proposals and concepts for how this might be ac
261
  hieved. For example, some researchers have proposed using superconducting coils to create a magnetic field around the spacecraft, which could help deflect charged particles and other forms of radiation. However,
262
  these proposals are still in the early stages of development and are not yet ready for implementation on spacecraft.
263
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
260
  The use of magnetic fields to protect spacecraft and humans from the effects of radiation is an ongoing area of research and development, and there are a number of proposals and concepts for how this might be ac
261
  hieved. For example, some researchers have proposed using superconducting coils to create a magnetic field around the spacecraft, which could help deflect charged particles and other forms of radiation. However,
262
  these proposals are still in the early stages of development and are not yet ready for implementation on spacecraft.
263
+ ```
264
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
265
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_migtissera__Synthia-70B-v1.1)
266
+
267
+ | Metric | Value |
268
+ |-----------------------|---------------------------|
269
+ | Avg. | 62.84 |
270
+ | ARC (25-shot) | 70.05 |
271
+ | HellaSwag (10-shot) | 87.12 |
272
+ | MMLU (5-shot) | 70.34 |
273
+ | TruthfulQA (0-shot) | 57.84 |
274
+ | Winogrande (5-shot) | 83.66 |
275
+ | GSM8K (5-shot) | 31.84 |
276
+ | DROP (3-shot) | 39.02 |