beomi
/

open-llama-2-ko-7b

Text Generation

text-generation-inference

Model card Files Files and versions Community

beomi commited on Dec 14, 2023

Commit

b82ed08

•

1 Parent(s): 5eeb547

Update README.md

Files changed (1) hide show

README.md +4 -0

README.md CHANGED Viewed

@@ -56,6 +56,10 @@ Trained with selected corpus within AIHub/Modu Corpus. The detailed dataset list
 - AI Hub: [corpus/AI_HUB](./corpus/AI_HUB)
 - Modu Corpus: [corpus/MODU_CORPUS](./corpus/MODU_CORPUS)
 **Vocab Expansion**
 | Model Name | Vocabulary Size | Description |

 - AI Hub: [corpus/AI_HUB](./corpus/AI_HUB)
 - Modu Corpus: [corpus/MODU_CORPUS](./corpus/MODU_CORPUS)
+Final JSONL dataset to trian this model is: 61GB.
+Total amount of tokens: (Approx.) 15B Tokens (*using expanded tokenizer. with original Llama tokenizer, >60B tokens.)
 **Vocab Expansion**
 | Model Name | Vocabulary Size | Description |