Update README.md
Browse files
README.md
CHANGED
@@ -20,7 +20,7 @@ extra_gated_fields:
|
|
20 |
|
21 |
## Model Description
|
22 |
|
23 |
-
`
|
24 |
[MT Bench](https://tatsu-lab.github.io/alpaca_eval/) and [Alpaca Benchmark](https://tatsu-lab.github.io/alpaca_eval/)
|
25 |
|
26 |
## Usage
|
@@ -29,7 +29,7 @@ Get started generating text with `Stable Zephyr 3B` by using the following code
|
|
29 |
|
30 |
```python
|
31 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
32 |
-
tokenizer = AutoTokenizer.from_pretrained("stabilityai/
|
33 |
model = AutoModelForCausalLM.from_pretrained(
|
34 |
"stable-zephyr-3b",
|
35 |
trust_remote_code=True,
|
@@ -51,7 +51,7 @@ print(tokenizer.decode(tokens[0], skip_special_tokens=True))
|
|
51 |
## Model Details
|
52 |
|
53 |
* **Developed by**: [Stability AI](https://stability.ai/)
|
54 |
-
* **Model type**: `
|
55 |
* **Language(s)**: English
|
56 |
* **Library**: [Alignment Handbook](https://github.com/huggingface/alignment-handbook.git)
|
57 |
* **Finetuned from model**: [stabilityai/stablelm-3b-4e1t](https://huggingface.co/stabilityai/stablelm-3b-4e1t)
|
@@ -81,7 +81,7 @@ The dataset is comprised of a mixture of open datasets large-scale datasets avai
|
|
81 |
|
82 |
| Model | Size | Alignment | MT-Bench (score) | AlpacaEval (win rate %) |
|
83 |
|-------------|-----|----|---------------|--------------|
|
84 |
-
| **
|
85 |
| Stable Zephyr (SFT only) | 3B | SFT | 6.04 | 71.15 |
|
86 |
| MPT-Chat | 7B |dSFT |5.42| -|
|
87 |
| Xwin-LMv0.1 | 7B| dPPO| 6.19| 87.83|
|
@@ -181,7 +181,7 @@ The dataset is comprised of a mixture of open datasets large-scale datasets avai
|
|
181 |
|
182 |
### Training Infrastructure
|
183 |
|
184 |
-
* **Hardware**: `
|
185 |
* **Code Base**: We use our internal script for SFT steps and used [HuggingFace Alignment Handbook script](https://github.com/huggingface/alignment-handbook) for DPO training.
|
186 |
## Use and Limitations
|
187 |
|
|
|
20 |
|
21 |
## Model Description
|
22 |
|
23 |
+
`StableLM Zephyr 3B` is a 3 billion parameter instruction tuned inspired by [HugginFaceH4's Zephyr 7B](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta) training pipeline this model was trained on a mix of publicly available datasets, synthetic datasets using [Direct Preference Optimization (DPO)](https://arxiv.org/abs/2305.18290), evaluation for this model based on
|
24 |
[MT Bench](https://tatsu-lab.github.io/alpaca_eval/) and [Alpaca Benchmark](https://tatsu-lab.github.io/alpaca_eval/)
|
25 |
|
26 |
## Usage
|
|
|
29 |
|
30 |
```python
|
31 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
32 |
+
tokenizer = AutoTokenizer.from_pretrained("stabilityai/stablelm-zephyr-3b-dpo")
|
33 |
model = AutoModelForCausalLM.from_pretrained(
|
34 |
"stable-zephyr-3b",
|
35 |
trust_remote_code=True,
|
|
|
51 |
## Model Details
|
52 |
|
53 |
* **Developed by**: [Stability AI](https://stability.ai/)
|
54 |
+
* **Model type**: `StableLM Zephyr 3B` models are auto-regressive language models based on the transformer decoder architecture.
|
55 |
* **Language(s)**: English
|
56 |
* **Library**: [Alignment Handbook](https://github.com/huggingface/alignment-handbook.git)
|
57 |
* **Finetuned from model**: [stabilityai/stablelm-3b-4e1t](https://huggingface.co/stabilityai/stablelm-3b-4e1t)
|
|
|
81 |
|
82 |
| Model | Size | Alignment | MT-Bench (score) | AlpacaEval (win rate %) |
|
83 |
|-------------|-----|----|---------------|--------------|
|
84 |
+
| **StableLM Zephyr 3B** 🪁 | 3B | DPO | 6.64 | 76.00 |
|
85 |
| Stable Zephyr (SFT only) | 3B | SFT | 6.04 | 71.15 |
|
86 |
| MPT-Chat | 7B |dSFT |5.42| -|
|
87 |
| Xwin-LMv0.1 | 7B| dPPO| 6.19| 87.83|
|
|
|
181 |
|
182 |
### Training Infrastructure
|
183 |
|
184 |
+
* **Hardware**: `StableLM Zephyr 3B` was trained on the Stability AI cluster across 8 nodes with 8 A100 80GBs GPUs for each nodes.
|
185 |
* **Code Base**: We use our internal script for SFT steps and used [HuggingFace Alignment Handbook script](https://github.com/huggingface/alignment-handbook) for DPO training.
|
186 |
## Use and Limitations
|
187 |
|