pvduy commited on
Commit
3da1353
1 Parent(s): 18fd9a0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +28 -0
README.md CHANGED
@@ -94,6 +94,34 @@ The dataset is comprised of a mixture of open datasets large-scale datasets avai
94
  | Claude 2 | - |RLHF |8.06| 91.36|
95
  | GPT-4 | -| RLHF |8.99| 95.28|
96
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
97
  ### Training Infrastructure
98
 
99
  * **Hardware**: `Stable Zephyr 3B` was trained on the Stability AI cluster across 8 nodes with 8 A100 80GBs GPUs for each nodes.
 
94
  | Claude 2 | - |RLHF |8.06| 91.36|
95
  | GPT-4 | -| RLHF |8.99| 95.28|
96
 
97
+ ## Other benchmark:
98
+ - BigBench: 0.3526.
99
+
100
+
101
+ | Task | Version | Metric | Value | Stderr |
102
+ |-----------------------------------------------------|---------|-------------------------|-------|--------|
103
+ | bigbench_causal_judgement | 0 | multiple_choice_grade | 0.5316| 0.0363 |
104
+ | bigbench_date_understanding | 0 | multiple_choice_grade | 0.4363| 0.0259 |
105
+ | bigbench_disambiguation_qa | 0 | multiple_choice_grade | 0.3217| 0.0291 |
106
+ | bigbench_dyck_languages | 0 | multiple_choice_grade | 0.1450| 0.0111 |
107
+ | bigbench_formal_fallacies_syllogisms_negation | 0 | multiple_choice_grade | 0.4982| 0.0042 |
108
+ | bigbench_geometric_shapes | 0 | multiple_choice_grade | 0.1086| 0.0164 |
109
+ | bigbench_hyperbaton | 0 | exact_str_match | 0.0000| 0.0000 |
110
+ | bigbench_logical_deduction_five_objects | 0 | multiple_choice_grade | 0.5232| 0.0022 |
111
+ | bigbench_logical_deduction_seven_objects | 0 | multiple_choice_grade | 0.2480| 0.0193 |
112
+ | bigbench_logical_deduction_three_objects | 0 | multiple_choice_grade | 0.1814| 0.0146 |
113
+ | bigbench_movie_recommendation | 0 | multiple_choice_grade | 0.4067| 0.0284 |
114
+ | bigbench_navigate | 0 | multiple_choice_grade | 0.2580| 0.0196 |
115
+ | bigbench_reasoning_about_colored_objects | 0 | multiple_choice_grade | 0.5990| 0.0155 |
116
+ | bigbench_ruin_names | 0 | multiple_choice_grade | 0.4370| 0.0111 |
117
+ | bigbench_salient_translation_error_detection | 0 | multiple_choice_grade | 0.3951| 0.0231 |
118
+ | bigbench_snarks | 0 | multiple_choice_grade | 0.2265| 0.0133 |
119
+ | bigbench_sports_understanding | 0 | multiple_choice_grade | 0.6464| 0.0356 |
120
+ | bigbench_temporal_sequences | 0 | multiple_choice_grade | 0.5091| 0.0159 |
121
+ | bigbench_tracking_shuffled_objects_five_objects | 0 | multiple_choice_grade | 0.2680| 0.0140 |
122
+ | bigbench_tracking_shuffled_objects_seven_objects | 0 | multiple_choice_grade | 0.1856| 0.0110 |
123
+ | bigbench_tracking_shuffled_objects_three_objects | 0 | multiple_choice_grade | 0.1269| 0.0080 |
124
+
125
  ### Training Infrastructure
126
 
127
  * **Hardware**: `Stable Zephyr 3B` was trained on the Stability AI cluster across 8 nodes with 8 A100 80GBs GPUs for each nodes.