qicao-apple commited on
Commit
43a6d81
1 Parent(s): 508b89b

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +130 -2
README.md CHANGED
@@ -40,8 +40,7 @@ openelm_3b_instruct = AutoModelForCausalLM.from_pretrained("apple/OpenELM-3B-Ins
40
 
41
  ```
42
 
43
-
44
- ## Example Usage
45
 
46
  Below we provide an example of loading the model via [HuggingFace Hub](https://huggingface.co/docs/hub/) as:
47
 
@@ -73,3 +72,132 @@ The little girl thought that this tree was very pretty. She wanted to climb up t
73
  """
74
  ```
75
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
40
 
41
  ```
42
 
43
+ ## Usage
 
44
 
45
  Below we provide an example of loading the model via [HuggingFace Hub](https://huggingface.co/docs/hub/) as:
46
 
 
72
  """
73
  ```
74
 
75
+
76
+ ## Main Results
77
+
78
+ ### Zero-Shot
79
+
80
+ | **Model Size** | **ARC-c** | **ARC-e** | **BoolQ** | **HellaSwag** | **PIQA** | **SciQ** | **WinoGrande** | **Average** |
81
+ |-----------------------------------------------------------------------------|-----------|-----------|-----------|---------------|-----------|-----------|----------------|-------------|
82
+ | [OpenELM-270M](https://huggingface.co/apple/OpenELM-270M) | 26.45 | 45.08 | **53.98** | 46.71 | 69.75 | **84.70** | **53.91** | 54.37 |
83
+ | [OpenELM-270M-Instruct](https://huggingface.co/apple/OpenELM-270M-Instruct) | **30.55** | **46.68** | 48.56 | **52.07** | **70.78** | 84.40 | 52.72 | **55.11** |
84
+ | [OpenELM-450M](https://huggingface.co/apple/OpenELM-450M) | 27.56 | 48.06 | 55.78 | 53.97 | 72.31 | 87.20 | 58.01 | 57.56 |
85
+ | [OpenELM-450M-Instruct](https://huggingface.co/apple/OpenELM-450M-Instruct) | **30.38** | **50.00** | **60.37** | **59.34** | **72.63** | **88.00** | **58.96** | **59.95** |
86
+ | [OpenELM-1_1B](https://huggingface.co/apple/OpenELM-1_1B) | 32.34 | **55.43** | 63.58 | 64.81 | **75.57** | **90.60** | 61.72 | 63.44 |
87
+ | [OpenELM-1_1B-Instruct](https://huggingface.co/apple/OpenELM-1_1B-Instruct) | **37.97** | 52.23 | **70.00** | **71.20** | 75.03 | 89.30 | **62.75** | **65.50** |
88
+ | [OpenELM-3B](https://huggingface.co/apple/OpenELM-3B) | 35.58 | 59.89 | 67.40 | 72.44 | 78.24 | **92.70** | 65.51 | 67.39 |
89
+ | [OpenELM-3B-Instruct](https://huggingface.co/apple/OpenELM-3B-Instruct) | **39.42** | **61.74** | **68.17** | **76.36** | **79.00** | 92.50 | **66.85** | **69.15** |
90
+
91
+ ### LLM360
92
+
93
+ | **Model Size** | **ARC-c** | **HellaSwag** | **MMLU** | **TruthfulQA** | **WinoGrande** | **Average** |
94
+ |-----------------------------------------------------------------------------|-----------|---------------|-----------|----------------|----------------|-------------|
95
+ | [OpenELM-270M](https://huggingface.co/apple/OpenELM-270M) | 27.65 | 47.15 | 25.72 | **39.24** | **53.83** | 38.72 |
96
+ | [OpenELM-270M-Instruct](https://huggingface.co/apple/OpenELM-270M-Instruct) | **32.51** | **51.58** | **26.70** | 38.72 | 53.20 | **40.54** |
97
+ | [OpenELM-450M](https://huggingface.co/apple/OpenELM-450M) | 30.20 | 53.86 | **26.01** | 40.18 | 57.22 | 41.50 |
98
+ | [OpenELM-450M-Instruct](https://huggingface.co/apple/OpenELM-450M-Instruct) | **33.53** | **59.31** | 25.41 | **40.48** | **58.33** | **43.41** |
99
+ | [OpenELM-1_1B](https://huggingface.co/apple/OpenELM-1_1B) | 36.69 | 65.71 | **27.05** | 36.98 | 63.22 | 45.93 |
100
+ | [OpenELM-1_1B-Instruct](https://huggingface.co/apple/OpenELM-1_1B-Instruct) | **41.55** | **71.83** | 25.65 | **45.95** | **64.72** | **49.94** |
101
+ | [OpenELM-3B](https://huggingface.co/apple/OpenELM-3B) | 42.24 | 73.28 | **26.76** | 34.98 | 67.25 | 48.90 |
102
+ | [OpenELM-3B-Instruct](https://huggingface.co/apple/OpenELM-3B-Instruct) | **47.70** | **76.87** | 24.80 | **38.76** | **67.96** | **51.22** |
103
+
104
+
105
+ ### OpenLLM Leaderboard
106
+
107
+ | **Model Size** | **ARC-c** | **CrowS-Pairs** | **HellaSwag** | **MMLU** | **PIQA** | **RACE** | **TruthfulQA** | **WinoGrande** | **Average** |
108
+ |-----------------------------------------------------------------------------|-----------|-----------------|---------------|-----------|-----------|-----------|----------------|----------------|-------------|
109
+ | [OpenELM-270M](https://huggingface.co/apple/OpenELM-270M) | 27.65 | **66.79** | 47.15 | 25.72 | 69.75 | 30.91 | **39.24** | **53.83** | 45.13 |
110
+ | [OpenELM-270M-Instruct](https://huggingface.co/apple/OpenELM-270M-Instruct) | **32.51** | 66.01 | **51.58** | **26.70** | **70.78** | 33.78 | 38.72 | 53.20 | **46.66** |
111
+ | [OpenELM-450M](https://huggingface.co/apple/OpenELM-450M) | 30.20 | **68.63** | 53.86 | **26.01** | 72.31 | 33.11 | 40.18 | 57.22 | 47.69 |
112
+ | [OpenELM-450M-Instruct](https://huggingface.co/apple/OpenELM-450M-Instruct) | **33.53** | 67.44 | **59.31** | 25.41 | **72.63** | **36.84** | **40.48** | **58.33** | **49.25** |
113
+ | [OpenELM-1_1B](https://huggingface.co/apple/OpenELM-1_1B) | 36.69 | **71.74** | 65.71 | **27.05** | **75.57** | 36.46 | 36.98 | 63.22 | 51.68 |
114
+ | [OpenELM-1_1B-Instruct](https://huggingface.co/apple/OpenELM-1_1B-Instruct) | **41.55** | 71.02 | **71.83** | 25.65 | 75.03 | **39.43** | **45.95** | **64.72** | **54.40** |
115
+ | [OpenELM-3B](https://huggingface.co/apple/OpenELM-3B) | 42.24 | **73.29** | 73.28 | **26.76** | 78.24 | **38.76** | 34.98 | 67.25 | 54.35 |
116
+ | [OpenELM-3B-Instruct](https://huggingface.co/apple/OpenELM-3B-Instruct) | **47.70** | 72.33 | **76.87** | 24.80 | **79.00** | 38.47 | **38.76** | **67.96** | **55.73** |
117
+
118
+ See the technical report for more results and comparison.
119
+
120
+ ## Evaluation
121
+
122
+ ### Setup
123
+
124
+ Install the following dependencies:
125
+
126
+ ```bash
127
+
128
+ # install public lm-eval-harness
129
+
130
+ harness_repo="public-lm-eval-harness"
131
+ git clone https://github.com/EleutherAI/lm-evaluation-harness ${harness_repo}
132
+ cd ${harness_repo}
133
+ # use main branch on 03-15-2024, SHA is dc90fec
134
+ git checkout dc90fec
135
+ pip install -e .
136
+ cd ..
137
+
138
+ # 66d6242 is the main branch on 2024-04-01
139
+ pip install datasets@git+https://github.com/huggingface/datasets.git@66d6242
140
+ pip install tokenizers>=0.15.2 transformers>=4.38.2 sentencepiece>=0.2.0
141
+
142
+ ```
143
+
144
+ ### Evaluate OpenELM
145
+
146
+ ```bash
147
+
148
+ # OpenELM-270M
149
+ hf_model=OpenELM-270M
150
+
151
+ # this flag is needed because lm-eval-harness set add_bos_token to False by default, but OpenELM uses LLaMa tokenizer which requires add_bos_token to be True
152
+ add_bos_token=True
153
+ batch_size=1
154
+
155
+ mkdir lm_eval_output
156
+
157
+ shot=0
158
+ task=arc_challenge,arc_easy,boolq,hellaswag,piqa,race,winogrande,sciq,truthfulqa_mc2
159
+ lm_eval --model hf \
160
+ --model_args pretrained=${hf_model},trust_remote_code=True,add_bos_token=${add_bos_token} \
161
+ --tasks ${task} \
162
+ --device cuda:0 \
163
+ --num_fewshot ${shot} \
164
+ --output_path ./lm_eval_output/${hf_model//\//_}_${task//,/_}-${shot}shot \
165
+ --batch_size ${batch_size} 2>&1 | tee ./lm_eval_output/eval-${hf_model//\//_}_${task//,/_}-${shot}shot.log
166
+
167
+ shot=5
168
+ task=mmlu,winogrande
169
+ lm_eval --model hf \
170
+ --model_args pretrained=${hf_model},trust_remote_code=True,add_bos_token=${add_bos_token} \
171
+ --tasks ${task} \
172
+ --device cuda:0 \
173
+ --num_fewshot ${shot} \
174
+ --output_path ./lm_eval_output/${hf_model//\//_}_${task//,/_}-${shot}shot \
175
+ --batch_size ${batch_size} 2>&1 | tee ./lm_eval_output/eval-${hf_model//\//_}_${task//,/_}-${shot}shot.log
176
+
177
+ shot=25
178
+ task=arc_challenge,crows_pairs_english
179
+ lm_eval --model hf \
180
+ --model_args pretrained=${hf_model},trust_remote_code=True,add_bos_token=${add_bos_token} \
181
+ --tasks ${task} \
182
+ --device cuda:0 \
183
+ --num_fewshot ${shot} \
184
+ --output_path ./lm_eval_output/${hf_model//\//_}_${task//,/_}-${shot}shot \
185
+ --batch_size ${batch_size} 2>&1 | tee ./lm_eval_output/eval-${hf_model//\//_}_${task//,/_}-${shot}shot.log
186
+
187
+ shot=10
188
+ task=hellaswag
189
+ lm_eval --model hf \
190
+ --model_args pretrained=${hf_model},trust_remote_code=True,add_bos_token=${add_bos_token} \
191
+ --tasks ${task} \
192
+ --device cuda:0 \
193
+ --num_fewshot ${shot} \
194
+ --output_path ./lm_eval_output/${hf_model//\//_}_${task//,/_}-${shot}shot \
195
+ --batch_size ${batch_size} 2>&1 | tee ./lm_eval_output/eval-${hf_model//\//_}_${task//,/_}-${shot}shot.log
196
+
197
+ ```
198
+
199
+
200
+ ## Bias, Risks, and Limitations
201
+
202
+ Our OpenELM models are not trained with any safety guarantees, the model outputs can be potentially inaccurate, harmful or contain biased information. produce inaccurate, biased or other objectionable responses to user prompts. Therefore, users and developers should conduct extensive safety testing and filtering suited to their specific needs.
203
+