banghua commited on
Commit
aa21e7f
1 Parent(s): f4d0da8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -9
README.md CHANGED
@@ -14,7 +14,7 @@ tags:
14
 
15
  <!-- Provide a quick summary of what the model is/does. -->
16
 
17
- - **Developed by:** Banghua Zhu * , Evan Frick * , Tianhao Wu * , Hanlin Zhu, Karthik Ganesan, Wei-Lin Chiang, Jian Zhang, and Jiantao Jiao.
18
  - **Model type:** Language Model finetuned with RLHF / RLAIF
19
  - **License:** Apache-2.0 license under the condition that the model is not used to compete with OpenAI
20
  - **Finetuned from model:** [Openchat-3.5-0106](https://huggingface.co/openchat/openchat-3.5-0106) (based on [Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1))
@@ -22,14 +22,7 @@ tags:
22
 
23
 
24
  We introduce Starling-LM-7B-beta, an open large language model (LLM) trained by Reinforcement Learning from AI Feedback (RLAIF). Starling-LM-7B-beta is trained from [Openchat-3.5-0106](https://huggingface.co/openchat/openchat-3.5-0106) with our new reward model [Nexusflow/Starling-RM-34B](https://huggingface.co/Nexusflow/Starling-RM-34B) and policy optimization method [Fine-Tuning Language Models from Human Preferences (PPO)](https://arxiv.org/abs/1909.08593).
25
- Harnessing the power of our ranking dataset, [berkeley-nest/Nectar](https://huggingface.co/datasets/berkeley-nest/Nectar), our upgraded reward model, [Starling-RM-34B](https://huggingface.co/Nexusflow/Starling-RM-34B), and our new reward training and policy tuning pipeline, Starling-LM-7B-beta scores an improved 8.12 in MT Bench with GPT-4 as a judge.
26
-
27
-
28
-
29
- For more detailed discussions, please check out our original [blog post](https://starling.cs.berkeley.edu)!
30
- <!-- Provide the basic links for the model. -->
31
-
32
- - **Blog:** https://starling.cs.berkeley.edu/
33
 
34
 
35
  ## Uses
 
14
 
15
  <!-- Provide a quick summary of what the model is/does. -->
16
 
17
+ - **Developed by: The Nexusflow Team (** Banghua Zhu * , Evan Frick * , Tianhao Wu * , Hanlin Zhu, Karthik Ganesan, Wei-Lin Chiang, Jian Zhang, and Jiantao Jiao).
18
  - **Model type:** Language Model finetuned with RLHF / RLAIF
19
  - **License:** Apache-2.0 license under the condition that the model is not used to compete with OpenAI
20
  - **Finetuned from model:** [Openchat-3.5-0106](https://huggingface.co/openchat/openchat-3.5-0106) (based on [Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1))
 
22
 
23
 
24
  We introduce Starling-LM-7B-beta, an open large language model (LLM) trained by Reinforcement Learning from AI Feedback (RLAIF). Starling-LM-7B-beta is trained from [Openchat-3.5-0106](https://huggingface.co/openchat/openchat-3.5-0106) with our new reward model [Nexusflow/Starling-RM-34B](https://huggingface.co/Nexusflow/Starling-RM-34B) and policy optimization method [Fine-Tuning Language Models from Human Preferences (PPO)](https://arxiv.org/abs/1909.08593).
25
+ Harnessing the power of the ranking dataset, [berkeley-nest/Nectar](https://huggingface.co/datasets/berkeley-nest/Nectar), the upgraded reward model, [Starling-RM-34B](https://huggingface.co/Nexusflow/Starling-RM-34B), and the new reward training and policy tuning pipeline, Starling-LM-7B-beta scores an improved 8.12 in MT Bench with GPT-4 as a judge.
 
 
 
 
 
 
 
26
 
27
 
28
  ## Uses