--- license: mit language: - bn metrics: - wer - cer tags: - seq2seq - ipa - bengali - byt5 --- # Regional bengali text to IPA transcription - byT5-small This is a fine-tuned version of the [umt5-base](https://huggingface.co/google/umt5-base) for the task of generating IPA transcriptions from regional bengali text. This was done on the dataset of the competition [“ভাষামূল: মুখের ভাষার খোঁজে“](https://www.kaggle.com/competitions/regipa/overview) by Bengali.AI. Best scores achieved in the leaderboards: - **Public score**: 0.01995 - **Private score**: 0.02072 ## Loading & using the model ```python # Load model directly from transformers import AutoTokenizer, AutoModelForSeq2SeqLM tokenizer = AutoTokenizer.from_pretrained("smji/ben2ipa-byt5small") model = AutoModelForSeq2SeqLM.from_pretrained("smji/ben2ipa-byt5small") """ The format of the input text must be: """ text = " bengali_text_here" text_ids = tokenizer(text, return_tensors='pt').input_ids model(text_ids) ``` ## Using the pipeline ```python # Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text2text-generation", model="smji/ben2ipa-byt5small", device=device) ``` ## Credits Done by [S M Jishanul Islam](https://github.com/S-M-J-I), [Sadia Ahmmed](https://github.com/sadia-ahmmed), [Sahid Hossain Mustakim](https://github.com/sratul35)