Commit
•
92de376
1
Parent(s):
1986693
Adding Evaluation Results (#1)
Browse files- Adding Evaluation Results (27c6cc6fe2c046ad4488a36341f2516054e18d66)
Co-authored-by: Open LLM Leaderboard PR Bot <[email protected]>
README.md
CHANGED
@@ -1,17 +1,120 @@
|
|
1 |
---
|
2 |
license: mit
|
3 |
widget:
|
4 |
-
|
5 |
-
|
6 |
-
|
7 |
-
|
8 |
-
|
9 |
-
|
10 |
-
|
11 |
-
|
12 |
-
|
13 |
-
|
14 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
15 |
---
|
16 |
|
17 |
1.3B test of two Cinder models merged layers 1-22 and 18-22, trained on math and step by step reasoning. Model Overview Cinder is an AI chatbot tailored for engaging users in scientific and educational conversations, offering companionship, and sparking imaginative exploration. It is built on the TinyLlama 1.1B parameter model and trained on a unique combination of datasets. Testing on Reason-with-cinder dataset.
|
@@ -20,3 +123,17 @@ widget:
|
|
20 |
|
21 |
|
22 |
![image/png](https://cdn-uploads.huggingface.co/production/uploads/6328952f798f8d122ce62a44/obCyZSvfUefEWrOXaeB3o.png)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: mit
|
3 |
widget:
|
4 |
+
- text: '<|system|>
|
5 |
+
|
6 |
+
You are a helpful assistant</s>
|
7 |
+
|
8 |
+
<|user|>
|
9 |
+
|
10 |
+
What is your name? Tell me about yourself.</s>
|
11 |
+
|
12 |
+
<|assistant|>
|
13 |
+
|
14 |
+
'
|
15 |
+
model-index:
|
16 |
+
- name: Tinyllama-1.3B-Cinder-Reason-Test
|
17 |
+
results:
|
18 |
+
- task:
|
19 |
+
type: text-generation
|
20 |
+
name: Text Generation
|
21 |
+
dataset:
|
22 |
+
name: AI2 Reasoning Challenge (25-Shot)
|
23 |
+
type: ai2_arc
|
24 |
+
config: ARC-Challenge
|
25 |
+
split: test
|
26 |
+
args:
|
27 |
+
num_few_shot: 25
|
28 |
+
metrics:
|
29 |
+
- type: acc_norm
|
30 |
+
value: 32.51
|
31 |
+
name: normalized accuracy
|
32 |
+
source:
|
33 |
+
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Josephgflowers/Tinyllama-1.3B-Cinder-Reason-Test
|
34 |
+
name: Open LLM Leaderboard
|
35 |
+
- task:
|
36 |
+
type: text-generation
|
37 |
+
name: Text Generation
|
38 |
+
dataset:
|
39 |
+
name: HellaSwag (10-Shot)
|
40 |
+
type: hellaswag
|
41 |
+
split: validation
|
42 |
+
args:
|
43 |
+
num_few_shot: 10
|
44 |
+
metrics:
|
45 |
+
- type: acc_norm
|
46 |
+
value: 55.85
|
47 |
+
name: normalized accuracy
|
48 |
+
source:
|
49 |
+
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Josephgflowers/Tinyllama-1.3B-Cinder-Reason-Test
|
50 |
+
name: Open LLM Leaderboard
|
51 |
+
- task:
|
52 |
+
type: text-generation
|
53 |
+
name: Text Generation
|
54 |
+
dataset:
|
55 |
+
name: MMLU (5-Shot)
|
56 |
+
type: cais/mmlu
|
57 |
+
config: all
|
58 |
+
split: test
|
59 |
+
args:
|
60 |
+
num_few_shot: 5
|
61 |
+
metrics:
|
62 |
+
- type: acc
|
63 |
+
value: 26.61
|
64 |
+
name: accuracy
|
65 |
+
source:
|
66 |
+
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Josephgflowers/Tinyllama-1.3B-Cinder-Reason-Test
|
67 |
+
name: Open LLM Leaderboard
|
68 |
+
- task:
|
69 |
+
type: text-generation
|
70 |
+
name: Text Generation
|
71 |
+
dataset:
|
72 |
+
name: TruthfulQA (0-shot)
|
73 |
+
type: truthful_qa
|
74 |
+
config: multiple_choice
|
75 |
+
split: validation
|
76 |
+
args:
|
77 |
+
num_few_shot: 0
|
78 |
+
metrics:
|
79 |
+
- type: mc2
|
80 |
+
value: 35.59
|
81 |
+
source:
|
82 |
+
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Josephgflowers/Tinyllama-1.3B-Cinder-Reason-Test
|
83 |
+
name: Open LLM Leaderboard
|
84 |
+
- task:
|
85 |
+
type: text-generation
|
86 |
+
name: Text Generation
|
87 |
+
dataset:
|
88 |
+
name: Winogrande (5-shot)
|
89 |
+
type: winogrande
|
90 |
+
config: winogrande_xl
|
91 |
+
split: validation
|
92 |
+
args:
|
93 |
+
num_few_shot: 5
|
94 |
+
metrics:
|
95 |
+
- type: acc
|
96 |
+
value: 62.12
|
97 |
+
name: accuracy
|
98 |
+
source:
|
99 |
+
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Josephgflowers/Tinyllama-1.3B-Cinder-Reason-Test
|
100 |
+
name: Open LLM Leaderboard
|
101 |
+
- task:
|
102 |
+
type: text-generation
|
103 |
+
name: Text Generation
|
104 |
+
dataset:
|
105 |
+
name: GSM8k (5-shot)
|
106 |
+
type: gsm8k
|
107 |
+
config: main
|
108 |
+
split: test
|
109 |
+
args:
|
110 |
+
num_few_shot: 5
|
111 |
+
metrics:
|
112 |
+
- type: acc
|
113 |
+
value: 2.35
|
114 |
+
name: accuracy
|
115 |
+
source:
|
116 |
+
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Josephgflowers/Tinyllama-1.3B-Cinder-Reason-Test
|
117 |
+
name: Open LLM Leaderboard
|
118 |
---
|
119 |
|
120 |
1.3B test of two Cinder models merged layers 1-22 and 18-22, trained on math and step by step reasoning. Model Overview Cinder is an AI chatbot tailored for engaging users in scientific and educational conversations, offering companionship, and sparking imaginative exploration. It is built on the TinyLlama 1.1B parameter model and trained on a unique combination of datasets. Testing on Reason-with-cinder dataset.
|
|
|
123 |
|
124 |
|
125 |
![image/png](https://cdn-uploads.huggingface.co/production/uploads/6328952f798f8d122ce62a44/obCyZSvfUefEWrOXaeB3o.png)
|
126 |
+
|
127 |
+
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
|
128 |
+
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_Josephgflowers__Tinyllama-1.3B-Cinder-Reason-Test)
|
129 |
+
|
130 |
+
| Metric |Value|
|
131 |
+
|---------------------------------|----:|
|
132 |
+
|Avg. |35.84|
|
133 |
+
|AI2 Reasoning Challenge (25-Shot)|32.51|
|
134 |
+
|HellaSwag (10-Shot) |55.85|
|
135 |
+
|MMLU (5-Shot) |26.61|
|
136 |
+
|TruthfulQA (0-shot) |35.59|
|
137 |
+
|Winogrande (5-shot) |62.12|
|
138 |
+
|GSM8k (5-shot) | 2.35|
|
139 |
+
|