Update README.md
Browse files
README.md
CHANGED
@@ -56,6 +56,10 @@ Trained with selected corpus within AIHub/Modu Corpus. The detailed dataset list
|
|
56 |
- AI Hub: [corpus/AI_HUB](./corpus/AI_HUB)
|
57 |
- Modu Corpus: [corpus/MODU_CORPUS](./corpus/MODU_CORPUS)
|
58 |
|
|
|
|
|
|
|
|
|
59 |
**Vocab Expansion**
|
60 |
|
61 |
| Model Name | Vocabulary Size | Description |
|
|
|
56 |
- AI Hub: [corpus/AI_HUB](./corpus/AI_HUB)
|
57 |
- Modu Corpus: [corpus/MODU_CORPUS](./corpus/MODU_CORPUS)
|
58 |
|
59 |
+
Final JSONL dataset to trian this model is: 61GB.
|
60 |
+
|
61 |
+
Total amount of tokens: (Approx.) 15B Tokens (*using expanded tokenizer. with original Llama tokenizer, >60B tokens.)
|
62 |
+
|
63 |
**Vocab Expansion**
|
64 |
|
65 |
| Model Name | Vocabulary Size | Description |
|