Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -15,7 +15,7 @@ language:
15
 
16
  ## Model Summary
17
 
18
- SmolLM is a series of state-of-the-art small language models available in three sizes: 135M, 360M, and 1.7B parameters. These models are built on Cosmo-Corpus, a meticulously curated high-quality training dataset. Cosmo-Corpus includes Cosmopedia v2 (28B tokens of synthetic textbooks and stories generated by Mixtral), Python-Edu (4B tokens of educational Python samples from The Stack), and FineWeb-Edu (220B tokens of deduplicated educational web samples from FineWeb). For duther details, we refer to our [blogpost](https://huggingface.co/blog/smollm).
19
 
20
  To build SmolLM-Instruct, we instruction tuned the models using publicly available permissive instruction datasets. We trained all three models for one epoch on the permissive subset of the WebInstructSub dataset, combined with StarCoder2-Self-OSS-Instruct. Following this, we performed DPO (Direct Preference Optimization) for one epoch: using HelpSteer for the 135M and 1.7B models, and argilla/dpo-mix-7k for the 360M model. We followed the training parameters from the Zephyr-Gemma recipe in the alignment handbook, but adjusted the SFT (Supervised Fine-Tuning) learning rate to 3e-4.
21
  [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)
 
15
 
16
  ## Model Summary
17
 
18
+ SmolLM is a series of state-of-the-art small language models available in three sizes: 135M, 360M, and 1.7B parameters. These models are built on Cosmo-Corpus, a meticulously curated high-quality training dataset. Cosmo-Corpus includes Cosmopedia v2 (28B tokens of synthetic textbooks and stories generated by Mixtral), Python-Edu (4B tokens of educational Python samples from The Stack), and FineWeb-Edu (220B tokens of deduplicated educational web samples from FineWeb). For further details, we refer to our [blogpost](https://huggingface.co/blog/smollm).
19
 
20
  To build SmolLM-Instruct, we instruction tuned the models using publicly available permissive instruction datasets. We trained all three models for one epoch on the permissive subset of the WebInstructSub dataset, combined with StarCoder2-Self-OSS-Instruct. Following this, we performed DPO (Direct Preference Optimization) for one epoch: using HelpSteer for the 135M and 1.7B models, and argilla/dpo-mix-7k for the 360M model. We followed the training parameters from the Zephyr-Gemma recipe in the alignment handbook, but adjusted the SFT (Supervised Fine-Tuning) learning rate to 3e-4.
21
  [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)