HuggingFaceTB
/

SmolLM-1.7B-Instruct

Text Generation

alignment-handbook

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Joseph717171 commited on Jul 17

Commit

1fd14c0

•

1 Parent(s): 6a3eeb1

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -20,7 +20,7 @@ SmolLM is a series of state-of-the-art small language models available in three
 To build SmolLM-Instruct, we instruction tuned the models using publicly available permissive instruction datasets. We trained all three models for one epoch on the permissive subset of the WebInstructSub dataset, combined with StarCoder2-Self-OSS-Instruct. Following this, we performed DPO (Direct Preference Optimization) for one epoch: using HelpSteer for the 135M and 1.7B models, and argilla/dpo-mix-7k for the 360M model. We followed the training parameters from the Zephyr-Gemma recipe in the alignment handbook, but adjusted the SFT (Supervised Fine-Tuning) learning rate to 3e-4.
 [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)
-This is the SmolLM-360M-Instruct.
 ### Generation
 ```bash

 To build SmolLM-Instruct, we instruction tuned the models using publicly available permissive instruction datasets. We trained all three models for one epoch on the permissive subset of the WebInstructSub dataset, combined with StarCoder2-Self-OSS-Instruct. Following this, we performed DPO (Direct Preference Optimization) for one epoch: using HelpSteer for the 135M and 1.7B models, and argilla/dpo-mix-7k for the 360M model. We followed the training parameters from the Zephyr-Gemma recipe in the alignment handbook, but adjusted the SFT (Supervised Fine-Tuning) learning rate to 3e-4.
 [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)
+This is the SmolLM-1.7B-Instruct.
 ### Generation
 ```bash