Automatic Korean translation is integrated. In the newspaper, "KO" links appear, and it will bring you to the translated version of full paper. This is done with the following workflow.
1. Grasp the list of arXiv IDs from π€ Daily Paper API
2. Distribute a number of sub-list of arXiv IDs to VMs (possibly spot instances since the job ends shortly)
3. Commit & push the translated paper in HTML to the designated GitHub repository
4. Newsletter will include the links to the HTML of each paper
Job distribution to a number of VMs are super easily done with [dstack]( ), and the translation sub-workflow is done through 1) download PDF of each paper with arxiv-dl package, 2) PDF => text with nougat-ocr package, 3) a custom trained model( nlp-with-deeplearning/enko-t5-small-v0 ) in π€ transformers to translate the English text into Korean line by line, and 4) reformat the translation into HTML.
Many people in Korea are not fluent in English but want to learn about new stuff in AI, so they usually use Google Translate or other services. This is why I made this feature for easier and direct access to the SOTA knowledge.
Are there other countries with the similar needs? If so, it would be wonderful to cooperate to support more languages. Please reach out anyone is interested in this.
PS; I always wanted to show the usefulness of open ML models by building a well working end to end product, and this newsletter shows it by featuring T5ForConditionalGeneration (translation), SOLAR LLM (summarization).
if you want to sub to the newsletter
if you want to look into the source codes