Commit
•
049b6e2
1
Parent(s):
c87658a
Adding Evaluation Results (#2)
Browse files- Adding Evaluation Results (0d618fa89706dc7e329e6117fff25efa8a3ea031)
Co-authored-by: Open LLM Leaderboard PR Bot <[email protected]>
README.md
CHANGED
@@ -260,4 +260,17 @@ ientific experiments to study the effects of solar wind and other charged partic
|
|
260 |
The use of magnetic fields to protect spacecraft and humans from the effects of radiation is an ongoing area of research and development, and there are a number of proposals and concepts for how this might be ac
|
261 |
hieved. For example, some researchers have proposed using superconducting coils to create a magnetic field around the spacecraft, which could help deflect charged particles and other forms of radiation. However,
|
262 |
these proposals are still in the early stages of development and are not yet ready for implementation on spacecraft.
|
263 |
-
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
260 |
The use of magnetic fields to protect spacecraft and humans from the effects of radiation is an ongoing area of research and development, and there are a number of proposals and concepts for how this might be ac
|
261 |
hieved. For example, some researchers have proposed using superconducting coils to create a magnetic field around the spacecraft, which could help deflect charged particles and other forms of radiation. However,
|
262 |
these proposals are still in the early stages of development and are not yet ready for implementation on spacecraft.
|
263 |
+
```
|
264 |
+
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
|
265 |
+
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_migtissera__Synthia-70B-v1.1)
|
266 |
+
|
267 |
+
| Metric | Value |
|
268 |
+
|-----------------------|---------------------------|
|
269 |
+
| Avg. | 62.84 |
|
270 |
+
| ARC (25-shot) | 70.05 |
|
271 |
+
| HellaSwag (10-shot) | 87.12 |
|
272 |
+
| MMLU (5-shot) | 70.34 |
|
273 |
+
| TruthfulQA (0-shot) | 57.84 |
|
274 |
+
| Winogrande (5-shot) | 83.66 |
|
275 |
+
| GSM8K (5-shot) | 31.84 |
|
276 |
+
| DROP (3-shot) | 39.02 |
|