metadata

license: mit
language:
  - bn
metrics:
  - wer
  - cer
tags:
  - seq2seq
  - ipa
  - bengali
  - byt5

Regional bengali text to IPA transcription - byT5-small

This is a fine-tuned version of the umt5-base for the task of generating IPA transcriptions from regional bengali text. This was done on the dataset of the competition “ভাষামূল: মুখের ভাষার খোঁজে“ by Bengali.AI.

Best scores achieved in the leaderboards:

Public score: 0.01995
Private score: 0.02072

Loading & using the model

# Load model directly
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("smji/ben2ipa-byt5small")
model = AutoModelForSeq2SeqLM.from_pretrained("smji/ben2ipa-byt5small")

"""
  The format of the input text must be: <district> <bengali_text>
"""
text = "<Chittagong> bengali_text_here"
text_ids = tokenizer(text, return_tensors='pt').input_ids
model(text_ids)

Using the pipeline

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text2text-generation", model="smji/ben2ipa-byt5small", device=device)

Credits

Done by S M Jishanul Islam, Sadia Ahmmed, Sahid Hossain Mustakim