teamapocalypseml
/

regben2ipa-byt5small

Text2Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

regben2ipa-byt5small / README.md

smji's picture

Update README.md

3efc245 verified 9 months ago

|

1.45 kB

	---
	license: mit
	language:
	- bn
	metrics:
	- wer
	- cer
	tags:
	- seq2seq
	- ipa
	- bengali
	- byt5
	---


	# Regional bengali text to IPA transcription - byT5-small

	This is a fine-tuned version of the [umt5-base](https://huggingface.co/google/umt5-base) for the task of generating IPA transcriptions from regional bengali text.
	This was done on the dataset of the competition [“ভাষামূল: মুখের ভাষার খোঁজে“](https://www.kaggle.com/competitions/regipa/overview) by Bengali.AI.

	Best scores achieved in the leaderboards:
	- Public score: 0.01995
	- Private score: 0.02072


	## Loading & using the model
	```python
	# Load model directly
	from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

	tokenizer = AutoTokenizer.from_pretrained("smji/ben2ipa-byt5small")
	model = AutoModelForSeq2SeqLM.from_pretrained("smji/ben2ipa-byt5small")

	"""
	The format of the input text must be: <district> <bengali_text>
	"""
	text = "<Chittagong> bengali_text_here"
	text_ids = tokenizer(text, return_tensors='pt').input_ids
	model(text_ids)
	```


	## Using the pipeline
	```python
	# Use a pipeline as a high-level helper
	from transformers import pipeline

	pipe = pipeline("text2text-generation", model="smji/ben2ipa-byt5small", device=device)
	```

	## Credits
	Done by [S M Jishanul Islam](https://github.com/S-M-J-I), [Sadia Ahmmed](https://github.com/sadia-ahmmed), [Sahid Hossain Mustakim](https://github.com/sratul35)