Commit
92de376
1 Parent(s): 1986693

Adding Evaluation Results (#1)

Browse files

- Adding Evaluation Results (27c6cc6fe2c046ad4488a36341f2516054e18d66)


Co-authored-by: Open LLM Leaderboard PR Bot <[email protected]>

Files changed (1) hide show
  1. README.md +128 -11
README.md CHANGED
@@ -1,17 +1,120 @@
1
  ---
2
  license: mit
3
  widget:
4
- - text: >
5
- <|system|>
6
-
7
- You are a helpful assistant</s>
8
-
9
- <|user|>
10
-
11
- What is your name? Tell me about yourself.</s>
12
-
13
- <|assistant|>
14
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
  ---
16
 
17
  1.3B test of two Cinder models merged layers 1-22 and 18-22, trained on math and step by step reasoning. Model Overview Cinder is an AI chatbot tailored for engaging users in scientific and educational conversations, offering companionship, and sparking imaginative exploration. It is built on the TinyLlama 1.1B parameter model and trained on a unique combination of datasets. Testing on Reason-with-cinder dataset.
@@ -20,3 +123,17 @@ widget:
20
 
21
 
22
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6328952f798f8d122ce62a44/obCyZSvfUefEWrOXaeB3o.png)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
3
  widget:
4
+ - text: '<|system|>
5
+
6
+ You are a helpful assistant</s>
7
+
8
+ <|user|>
9
+
10
+ What is your name? Tell me about yourself.</s>
11
+
12
+ <|assistant|>
13
+
14
+ '
15
+ model-index:
16
+ - name: Tinyllama-1.3B-Cinder-Reason-Test
17
+ results:
18
+ - task:
19
+ type: text-generation
20
+ name: Text Generation
21
+ dataset:
22
+ name: AI2 Reasoning Challenge (25-Shot)
23
+ type: ai2_arc
24
+ config: ARC-Challenge
25
+ split: test
26
+ args:
27
+ num_few_shot: 25
28
+ metrics:
29
+ - type: acc_norm
30
+ value: 32.51
31
+ name: normalized accuracy
32
+ source:
33
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Josephgflowers/Tinyllama-1.3B-Cinder-Reason-Test
34
+ name: Open LLM Leaderboard
35
+ - task:
36
+ type: text-generation
37
+ name: Text Generation
38
+ dataset:
39
+ name: HellaSwag (10-Shot)
40
+ type: hellaswag
41
+ split: validation
42
+ args:
43
+ num_few_shot: 10
44
+ metrics:
45
+ - type: acc_norm
46
+ value: 55.85
47
+ name: normalized accuracy
48
+ source:
49
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Josephgflowers/Tinyllama-1.3B-Cinder-Reason-Test
50
+ name: Open LLM Leaderboard
51
+ - task:
52
+ type: text-generation
53
+ name: Text Generation
54
+ dataset:
55
+ name: MMLU (5-Shot)
56
+ type: cais/mmlu
57
+ config: all
58
+ split: test
59
+ args:
60
+ num_few_shot: 5
61
+ metrics:
62
+ - type: acc
63
+ value: 26.61
64
+ name: accuracy
65
+ source:
66
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Josephgflowers/Tinyllama-1.3B-Cinder-Reason-Test
67
+ name: Open LLM Leaderboard
68
+ - task:
69
+ type: text-generation
70
+ name: Text Generation
71
+ dataset:
72
+ name: TruthfulQA (0-shot)
73
+ type: truthful_qa
74
+ config: multiple_choice
75
+ split: validation
76
+ args:
77
+ num_few_shot: 0
78
+ metrics:
79
+ - type: mc2
80
+ value: 35.59
81
+ source:
82
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Josephgflowers/Tinyllama-1.3B-Cinder-Reason-Test
83
+ name: Open LLM Leaderboard
84
+ - task:
85
+ type: text-generation
86
+ name: Text Generation
87
+ dataset:
88
+ name: Winogrande (5-shot)
89
+ type: winogrande
90
+ config: winogrande_xl
91
+ split: validation
92
+ args:
93
+ num_few_shot: 5
94
+ metrics:
95
+ - type: acc
96
+ value: 62.12
97
+ name: accuracy
98
+ source:
99
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Josephgflowers/Tinyllama-1.3B-Cinder-Reason-Test
100
+ name: Open LLM Leaderboard
101
+ - task:
102
+ type: text-generation
103
+ name: Text Generation
104
+ dataset:
105
+ name: GSM8k (5-shot)
106
+ type: gsm8k
107
+ config: main
108
+ split: test
109
+ args:
110
+ num_few_shot: 5
111
+ metrics:
112
+ - type: acc
113
+ value: 2.35
114
+ name: accuracy
115
+ source:
116
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Josephgflowers/Tinyllama-1.3B-Cinder-Reason-Test
117
+ name: Open LLM Leaderboard
118
  ---
119
 
120
  1.3B test of two Cinder models merged layers 1-22 and 18-22, trained on math and step by step reasoning. Model Overview Cinder is an AI chatbot tailored for engaging users in scientific and educational conversations, offering companionship, and sparking imaginative exploration. It is built on the TinyLlama 1.1B parameter model and trained on a unique combination of datasets. Testing on Reason-with-cinder dataset.
 
123
 
124
 
125
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6328952f798f8d122ce62a44/obCyZSvfUefEWrOXaeB3o.png)
126
+
127
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
128
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_Josephgflowers__Tinyllama-1.3B-Cinder-Reason-Test)
129
+
130
+ | Metric |Value|
131
+ |---------------------------------|----:|
132
+ |Avg. |35.84|
133
+ |AI2 Reasoning Challenge (25-Shot)|32.51|
134
+ |HellaSwag (10-Shot) |55.85|
135
+ |MMLU (5-Shot) |26.61|
136
+ |TruthfulQA (0-shot) |35.59|
137
+ |Winogrande (5-shot) |62.12|
138
+ |GSM8k (5-shot) | 2.35|
139
+