license: mit
language:
- tr
- en
library_name: transformers
pipeline_tag: translation
Model Card: nllb-3.3B-Turkish
Version: Based on facebook/nllb-200-3.3B: Further pretrained on a large corpus of English-to-Turkish dataset.
The training dataset for nllb-3.3B-Turkish consists of approximately 490,000 pairs of translated texts. These pairs are predominantly sourced from movie subtitles, offering a diverse range of linguistic structures, idiomatic expressions, and cultural references. This rich dataset ensures the model is well-equipped to handle a variety of translation tasks within its domain.
Intended Use
nllb-3.3B-Turkish is designed for applications requiring English-to-Turkish translations, particularly in the context of subtitles. It is suitable for use in media localization, subtitling platforms, and language learning tools. The model can be utilized by developers, linguists, and content creators to facilitate seamless translation and enhance cross-cultural media accessibility.
Model Training
Details regarding the model's training procedure, architecture, and fine-tuning processes will be extensively covered in the upcoming paper.
Example Outputs
Question: What is the meaning of life? That was all- a simple question; one that tended to close in on one with years, the great revelation had never come. The great revelation perhaps never did come. Instead, there were little daily miracles, illuminations, matches struck unexpectedly in the dark; here was one.
Answer: Hayatın anlamı nedir? Bu basit bir soruydu. Yıllar geçtikçe insanın içine kapanmaya eğilimli olan bir soruydu. Büyük vahiy hiç gelmemişti. Büyük vahiy belki de hiç gelmemişti. Bunun yerine, küçük günlük mucizeler, aydınlatmalar, karanlıkta beklenmedik şekilde ateşler açılırdı. İşte bir tanesi.
model = AutoModelForSeq2SeqLM.from_pretrained(model_name, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(model_name, src_lang=src_lang, tgt_lang=tgt_lang)
translator = pipeline('translation', model=model, tokenizer=tokenizer, src_lang=src_lang, tgt_lang=tgt_lang, device_map="auto")
output = translator(question_prompt, max_length=512)[0]['translation_text']