smji commited on
Commit
20d4698
1 Parent(s): 65e91c7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +49 -0
README.md CHANGED
@@ -1,3 +1,52 @@
1
  ---
2
  license: mit
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
3
+ language:
4
+ - bn
5
+ metrics:
6
+ - wer
7
+ - cer
8
+ tags:
9
+ - seq2seq
10
+ - ipa
11
+ - bengali
12
+ - byt5
13
  ---
14
+
15
+
16
+ # Regional bengali text to IPA transcription - umt5-base
17
+
18
+ This is a fine-tuned version of the [umt5-base](https://huggingface.co/google/umt5-base) for the task of generating IPA transcriptions from regional bengali text.
19
+ This was done on the dataset of the competition [“ভাষামূল: মুখের ভাষার খোঁজে“](https://www.kaggle.com/competitions/regipa/overview) by Bengali.AI.
20
+
21
+ Best scores achieved in the leaderboards:
22
+ - **Public score**: 0.01995
23
+ - **Private score**: 0.02072
24
+
25
+
26
+ ## Loading & using the model
27
+ ```python
28
+ # Load model directly
29
+ from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
30
+
31
+ tokenizer = AutoTokenizer.from_pretrained("smji/ben2ipa-byt5small")
32
+ model = AutoModelForSeq2SeqLM.from_pretrained("smji/ben2ipa-byt5small")
33
+
34
+ """
35
+ The format of the input text must be: <district> <bengali_text>
36
+ """
37
+ text = "<Chittagong> bengali_text_here"
38
+ text_ids = tokenizer(text, return_tensors='pt').input_ids
39
+ model(text_ids)
40
+ ```
41
+
42
+
43
+ ## Using the pipeline
44
+ ```python
45
+ # Use a pipeline as a high-level helper
46
+ from transformers import pipeline
47
+
48
+ pipe = pipeline("text2text-generation", model="smji/ben2ipa-byt5small", device=device)
49
+ ```
50
+
51
+ ## Credits
52
+ Done by [S M Jishanul Islam](https://github.com/S-M-J-I), [Sadia Ahmmed](https://github.com/sadia-ahmmed), [Sahid Hossain Mustakim](https://github.com/sratul35)