Update README.md
Browse files
README.md
CHANGED
@@ -12,6 +12,7 @@ base_model: stabilityai/stablelm-3b-4e1t
|
|
12 |
|
13 |
# Rocket-3B π¦
|
14 |
<b>Rocket</b> π¦ is a 3 billion large language model that was trained on a mix of publicly available datasets using [Direct Preference Optimization (DPO)](https://arxiv.org/abs/2305.18290). The prompt format used is <b>ChatML</b>.
|
|
|
15 |
|
16 |
|
17 |
## Model description
|
@@ -31,13 +32,14 @@ Despite its compact dimensions, the model achieves outstanding scores in both MT
|
|
31 |
| Falcon-Instruct π¦
| 40B | SFT |5.17 |45.71|
|
32 |
| Orca-2| 13B | SFT |6.15 |-|
|
33 |
| Xwin-LMv0.1 | 7B| PPO | 6.19| 87.83|
|
34 |
-
| Llama2-Chat π¦| 7B |RLHF |6.26|
|
35 |
| TΓLU 2 π«| 7B | DPO |6.27| 85.1|
|
36 |
| Guanaco π¦| 65B | SFT |6.41| 71.80|
|
37 |
| **Rocket** π¦ | **3B** | **DPO** | **6.56** | **79.75** |
|
38 |
-
| Llama2-Chat π¦| 13B |RLHF |6.65|
|
39 |
| Zephyr-7b-Ξ± πͺ |7B| DPO| 6.88| -|
|
40 |
| Vicuna v1.3 π¦| 33B | SFT |7.12 |88.99|
|
|
|
41 |
| WizardLM v1.0 π¦| 70B |SFT |7.71 |-|
|
42 |
| GPT-3.5-turbo | - |RLHF |7.94 |89.37|
|
43 |
|
@@ -129,7 +131,7 @@ generated_text = model.generate(**inputs, max_length=3084, top_p=0.95, do_sample
|
|
129 |
## Bias, Risks, and Limitations
|
130 |
Unlike ChatGPT, which incorporates in-the-loop filtering of responses and is aligned during the RLHF phase for safe completions, our model lacks these features. Consequently, it may generate problematic outputs, particularly when prompted in certain ways.
|
131 |
|
132 |
-
The
|
133 |
|
134 |
|
135 |
*Model card adapted from [Zephyr Beta](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta/blob/main/README.md) and [Tulu-2-7B](https://huggingface.co/allenai/tulu-2-7b/blob/main/README.md)*
|
|
|
12 |
|
13 |
# Rocket-3B π¦
|
14 |
<b>Rocket</b> π¦ is a 3 billion large language model that was trained on a mix of publicly available datasets using [Direct Preference Optimization (DPO)](https://arxiv.org/abs/2305.18290). The prompt format used is <b>ChatML</b>.
|
15 |
+
*The model name is inspired by the small but formidable character from 'Guardians of the Galaxy'. Similar to its namesake, this model, with its 3 billion parameters, showcases remarkable efficiency and effectiveness, challenging larger models despite its smaller size."*
|
16 |
|
17 |
|
18 |
## Model description
|
|
|
32 |
| Falcon-Instruct π¦
| 40B | SFT |5.17 |45.71|
|
33 |
| Orca-2| 13B | SFT |6.15 |-|
|
34 |
| Xwin-LMv0.1 | 7B| PPO | 6.19| 87.83|
|
35 |
+
| Llama2-Chat π¦| 7B |RLHF |6.26| 71.37|
|
36 |
| TΓLU 2 π«| 7B | DPO |6.27| 85.1|
|
37 |
| Guanaco π¦| 65B | SFT |6.41| 71.80|
|
38 |
| **Rocket** π¦ | **3B** | **DPO** | **6.56** | **79.75** |
|
39 |
+
| Llama2-Chat π¦| 13B |RLHF |6.65| 81.09|
|
40 |
| Zephyr-7b-Ξ± πͺ |7B| DPO| 6.88| -|
|
41 |
| Vicuna v1.3 π¦| 33B | SFT |7.12 |88.99|
|
42 |
+
| Zephyr-7b-Ξ² πͺ |7B| DPO| 7.34| 90.60|
|
43 |
| WizardLM v1.0 π¦| 70B |SFT |7.71 |-|
|
44 |
| GPT-3.5-turbo | - |RLHF |7.94 |89.37|
|
45 |
|
|
|
131 |
## Bias, Risks, and Limitations
|
132 |
Unlike ChatGPT, which incorporates in-the-loop filtering of responses and is aligned during the RLHF phase for safe completions, our model lacks these features. Consequently, it may generate problematic outputs, particularly when prompted in certain ways.
|
133 |
|
134 |
+
The pretraining dataset is comprised of a filtered mixture of open-source large-scale datasets available on the [HuggingFace Hub](https://huggingface.co/datasets): Falcon RefinedWeb extract ([Penedo et al., 2023](https://huggingface.co/datasets/tiiuae/falcon-refinedweb)), RedPajama-Data ([Together Computer., 2023](https://github.com/togethercomputer/RedPajama-Data)) and The Pile ([Gao et al., 2020](https://arxiv.org/abs/2101.00027)) both without the *Books3* subset, and StarCoder ([Li et al., 2023](https://arxiv.org/abs/2305.06161)).
|
135 |
|
136 |
|
137 |
*Model card adapted from [Zephyr Beta](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta/blob/main/README.md) and [Tulu-2-7B](https://huggingface.co/allenai/tulu-2-7b/blob/main/README.md)*
|