Dataset?

#1
by 0xbitches - opened

Hi, the model card claims the model is trained using publicly available datasets, but I cannot seem to find any references to what these datasets are. Are there plans to include the actual datasets used?

The pie charts have each of the datasets listed

I see. I guess my question is then are there plans to organize the dataset and publish the selected subsets to make sure the model training is reproducible.

JetMoE org

Yes, we plan to give more details about the data selection and mixture in our technical report.

Sign up or log in to comment