Joseph717171 commited on
Commit
1fd14c0
1 Parent(s): 6a3eeb1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -20,7 +20,7 @@ SmolLM is a series of state-of-the-art small language models available in three
20
  To build SmolLM-Instruct, we instruction tuned the models using publicly available permissive instruction datasets. We trained all three models for one epoch on the permissive subset of the WebInstructSub dataset, combined with StarCoder2-Self-OSS-Instruct. Following this, we performed DPO (Direct Preference Optimization) for one epoch: using HelpSteer for the 135M and 1.7B models, and argilla/dpo-mix-7k for the 360M model. We followed the training parameters from the Zephyr-Gemma recipe in the alignment handbook, but adjusted the SFT (Supervised Fine-Tuning) learning rate to 3e-4.
21
  [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)
22
 
23
- This is the SmolLM-360M-Instruct.
24
 
25
  ### Generation
26
  ```bash
 
20
  To build SmolLM-Instruct, we instruction tuned the models using publicly available permissive instruction datasets. We trained all three models for one epoch on the permissive subset of the WebInstructSub dataset, combined with StarCoder2-Self-OSS-Instruct. Following this, we performed DPO (Direct Preference Optimization) for one epoch: using HelpSteer for the 135M and 1.7B models, and argilla/dpo-mix-7k for the 360M model. We followed the training parameters from the Zephyr-Gemma recipe in the alignment handbook, but adjusted the SFT (Supervised Fine-Tuning) learning rate to 3e-4.
21
  [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)
22
 
23
+ This is the SmolLM-1.7B-Instruct.
24
 
25
  ### Generation
26
  ```bash