--- library_name: transformers license: apache-2.0 datasets: - Vikhrmodels/Flan_translated_300k - d0rj/OpenOrca-ru language: - ru - en --- # Model Card for ru-rope-t5-small-instruct The Russian Rotary Position Embedding T5 model of small version after instruct tuning ## Model Details The model was trained in a Russian corpus with a mix of English using the [Mixture-Of-Denoisers](https://arxiv.org/abs/2205.05131v1) pre-training method by [UL2](https://huggingface.co/google/ul2) on 1024 length sequences. Training using Flash Attention 2 is available because of the replacement of bias with rotary encoding. - **Model type:** [RoPE T5](https://huggingface.co/melmoth/ru-rope-t5-small-instruct/blob/main/t5.py) - **Language(s) (NLP):** Russian, English ## Uses Finetuning for downstream tasks ## Bias, Risks, and Limitations Despite the instructional tuning, it is not recommended to use in zero-shot mode due to the small size ## Training Details ### Training Data A corpus of Russian texts from [Vikhr](https://huggingface.co/Vikhrmodels) filtered by [FRED-T5-1.7B](https://huggingface.co/ai-forever/FRED-T5-1.7B) perplexy. Instructions are translated English set ### Training Procedure Using AdamWScale instead of Adafactor for stable learning without loss explosions #### Metrics ![rsg](rsg_results.png) ## Model Card Contact [@TheMelmoth](https://t.me/TheMelmoth)