Adding Evaluation Results
#1
by
leaderboard-pr-bot
- opened
README.md
CHANGED
@@ -1,4 +1,120 @@
|
|
1 |
---
|
2 |
license: mit
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
3 |
---
|
4 |
-
Not working on hugginface for some reason. Still looking into it. Downloaded files are working as expected... GGUF files working, re Uploading. Overview Cinder is an AI chatbot tailored for engaging users in scientific and educational conversations, offering companionship, and sparking imaginative exploration. It is built on the MiniChat 3B parameter model and trained on a unique combination of datasets.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: mit
|
3 |
+
model-index:
|
4 |
+
- name: 3BigReasonCinder
|
5 |
+
results:
|
6 |
+
- task:
|
7 |
+
type: text-generation
|
8 |
+
name: Text Generation
|
9 |
+
dataset:
|
10 |
+
name: AI2 Reasoning Challenge (25-Shot)
|
11 |
+
type: ai2_arc
|
12 |
+
config: ARC-Challenge
|
13 |
+
split: test
|
14 |
+
args:
|
15 |
+
num_few_shot: 25
|
16 |
+
metrics:
|
17 |
+
- type: acc_norm
|
18 |
+
value: 41.72
|
19 |
+
name: normalized accuracy
|
20 |
+
source:
|
21 |
+
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Josephgflowers/3BigReasonCinder
|
22 |
+
name: Open LLM Leaderboard
|
23 |
+
- task:
|
24 |
+
type: text-generation
|
25 |
+
name: Text Generation
|
26 |
+
dataset:
|
27 |
+
name: HellaSwag (10-Shot)
|
28 |
+
type: hellaswag
|
29 |
+
split: validation
|
30 |
+
args:
|
31 |
+
num_few_shot: 10
|
32 |
+
metrics:
|
33 |
+
- type: acc_norm
|
34 |
+
value: 65.16
|
35 |
+
name: normalized accuracy
|
36 |
+
source:
|
37 |
+
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Josephgflowers/3BigReasonCinder
|
38 |
+
name: Open LLM Leaderboard
|
39 |
+
- task:
|
40 |
+
type: text-generation
|
41 |
+
name: Text Generation
|
42 |
+
dataset:
|
43 |
+
name: MMLU (5-Shot)
|
44 |
+
type: cais/mmlu
|
45 |
+
config: all
|
46 |
+
split: test
|
47 |
+
args:
|
48 |
+
num_few_shot: 5
|
49 |
+
metrics:
|
50 |
+
- type: acc
|
51 |
+
value: 44.79
|
52 |
+
name: accuracy
|
53 |
+
source:
|
54 |
+
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Josephgflowers/3BigReasonCinder
|
55 |
+
name: Open LLM Leaderboard
|
56 |
+
- task:
|
57 |
+
type: text-generation
|
58 |
+
name: Text Generation
|
59 |
+
dataset:
|
60 |
+
name: TruthfulQA (0-shot)
|
61 |
+
type: truthful_qa
|
62 |
+
config: multiple_choice
|
63 |
+
split: validation
|
64 |
+
args:
|
65 |
+
num_few_shot: 0
|
66 |
+
metrics:
|
67 |
+
- type: mc2
|
68 |
+
value: 44.76
|
69 |
+
source:
|
70 |
+
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Josephgflowers/3BigReasonCinder
|
71 |
+
name: Open LLM Leaderboard
|
72 |
+
- task:
|
73 |
+
type: text-generation
|
74 |
+
name: Text Generation
|
75 |
+
dataset:
|
76 |
+
name: Winogrande (5-shot)
|
77 |
+
type: winogrande
|
78 |
+
config: winogrande_xl
|
79 |
+
split: validation
|
80 |
+
args:
|
81 |
+
num_few_shot: 5
|
82 |
+
metrics:
|
83 |
+
- type: acc
|
84 |
+
value: 64.96
|
85 |
+
name: accuracy
|
86 |
+
source:
|
87 |
+
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Josephgflowers/3BigReasonCinder
|
88 |
+
name: Open LLM Leaderboard
|
89 |
+
- task:
|
90 |
+
type: text-generation
|
91 |
+
name: Text Generation
|
92 |
+
dataset:
|
93 |
+
name: GSM8k (5-shot)
|
94 |
+
type: gsm8k
|
95 |
+
config: main
|
96 |
+
split: test
|
97 |
+
args:
|
98 |
+
num_few_shot: 5
|
99 |
+
metrics:
|
100 |
+
- type: acc
|
101 |
+
value: 27.6
|
102 |
+
name: accuracy
|
103 |
+
source:
|
104 |
+
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Josephgflowers/3BigReasonCinder
|
105 |
+
name: Open LLM Leaderboard
|
106 |
---
|
107 |
+
Not working on hugginface for some reason. Still looking into it. Downloaded files are working as expected... GGUF files working, re Uploading. Overview Cinder is an AI chatbot tailored for engaging users in scientific and educational conversations, offering companionship, and sparking imaginative exploration. It is built on the MiniChat 3B parameter model and trained on a unique combination of datasets.
|
108 |
+
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
|
109 |
+
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_Josephgflowers__3BigReasonCinder)
|
110 |
+
|
111 |
+
| Metric |Value|
|
112 |
+
|---------------------------------|----:|
|
113 |
+
|Avg. |48.16|
|
114 |
+
|AI2 Reasoning Challenge (25-Shot)|41.72|
|
115 |
+
|HellaSwag (10-Shot) |65.16|
|
116 |
+
|MMLU (5-Shot) |44.79|
|
117 |
+
|TruthfulQA (0-shot) |44.76|
|
118 |
+
|Winogrande (5-shot) |64.96|
|
119 |
+
|GSM8k (5-shot) |27.60|
|
120 |
+
|