l3cube-pune commited on
Commit
52c5639
1 Parent(s): a12ae77

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +50 -87
README.md CHANGED
@@ -7,45 +7,65 @@ tags:
7
  - transformers
8
  language:
9
  - multilingual
 
 
 
 
 
 
 
 
 
 
10
  widget:
11
- - source_sentence: "दिवाळी आपण मोठ्या उत्साहाने साजरी करतो"
12
  sentences:
13
- - "दिवाळी आपण आनंदाने साजरी करतो"
14
- - "दिवाळी हा दिव्यांचा सण आहे"
15
- example_title: "Monolingual- Marathi"
16
-
17
- - source_sentence: "हम दीपावली उत्साह के साथ मनाते हैं"
18
  sentences:
19
- - "हम दीपावली खुशियों से मनाते हैं"
20
- - "दिवाली रोशनी का त्योहार है"
21
- example_title: "Monolingual- Hindi"
22
-
23
- - source_sentence: "અમે ઉત્સાહથી દિવાળી ઉજવીએ છીએ"
24
  sentences:
25
- - "દિવાળી આપણે ખુશીઓથી ઉજવીએ છીએ"
26
- - "દિવાળી એ રોશનીનો તહેવાર છે"
27
- example_title: "Monolingual- Gujarati"
28
-
29
- - source_sentence: "आम्हाला भारतीय असल्याचा अभिमान आहे"
30
  sentences:
31
- - "हमें भारतीय होने पर गर्व है"
32
- - "భారతీయులమైనందుకు గర్విస్తున్నాం"
33
- - "અમને ભારતીય હોવાનો ગર્વ છે"
34
- example_title: "Cross-lingual 1"
35
-
36
- - source_sentence: "ਬਾਰਿਸ਼ ਤੋਂ ਬਾਅਦ ਬਗੀਚਾ ਸੁੰਦਰ ਦਿਖਾਈ ਦਿੰਦਾ ਹੈ"
37
  sentences:
38
- - "മഴയ്ക്ക് ശേഷം പൂന്തോട്ടം മനോഹരമായി കാണപ്പെടുന്നു"
39
- - "ବର୍ଷା ପରେ ବଗିଚା ସୁନ୍ଦର ଦେଖାଯାଏ |"
40
- - "बारिश के बाद बगीचा सुंदर दिखता है"
41
- example_title: "Cross-lingual 2"
42
  ---
43
 
44
- # {MODEL_NAME}
 
 
 
 
 
45
 
46
- This is a [sentence-transformers](https://www.SBERT.net) model: It maps sentences & paragraphs to a 768 dimensional dense vector space and can be used for tasks like clustering or semantic search.
47
 
48
- <!--- Describe your model here -->
 
 
 
 
 
 
 
 
 
49
 
50
  ## Usage (Sentence-Transformers)
51
 
@@ -102,61 +122,4 @@ sentence_embeddings = mean_pooling(model_output, encoded_input['attention_mask']
102
 
103
  print("Sentence embeddings:")
104
  print(sentence_embeddings)
105
- ```
106
-
107
-
108
-
109
- ## Evaluation Results
110
-
111
- <!--- Describe how your model was evaluated -->
112
-
113
- For an automated evaluation of this model, see the *Sentence Embeddings Benchmark*: [https://seb.sbert.net](https://seb.sbert.net?model_name={MODEL_NAME})
114
-
115
-
116
- ## Training
117
- The model was trained with the parameters:
118
-
119
- **DataLoader**:
120
-
121
- `sentence_transformers.datasets.NoDuplicatesDataLoader.NoDuplicatesDataLoader` of length 80042 with parameters:
122
- ```
123
- {'batch_size': 32}
124
- ```
125
-
126
- **Loss**:
127
-
128
- `sentence_transformers.losses.MultipleNegativesRankingLoss.MultipleNegativesRankingLoss` with parameters:
129
- ```
130
- {'scale': 20.0, 'similarity_fct': 'cos_sim'}
131
- ```
132
-
133
- Parameters of the fit()-Method:
134
- ```
135
- {
136
- "epochs": 1,
137
- "evaluation_steps": 0,
138
- "evaluator": "sentence_transformers.evaluation.EmbeddingSimilarityEvaluator.EmbeddingSimilarityEvaluator",
139
- "max_grad_norm": 1,
140
- "optimizer_class": "<class 'torch.optim.adamw.AdamW'>",
141
- "optimizer_params": {
142
- "lr": 2e-05
143
- },
144
- "scheduler": "WarmupLinear",
145
- "steps_per_epoch": null,
146
- "warmup_steps": 8004,
147
- "weight_decay": 0.01
148
- }
149
- ```
150
-
151
-
152
- ## Full Model Architecture
153
- ```
154
- SentenceTransformer(
155
- (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
156
- (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False})
157
- )
158
- ```
159
-
160
- ## Citing & Authors
161
-
162
- <!--- Describe where people can find more information -->
 
7
  - transformers
8
  language:
9
  - multilingual
10
+ - hi
11
+ - mr
12
+ - kn
13
+ - ta
14
+ - te
15
+ - ml
16
+ - gu
17
+ - or
18
+ - pa
19
+ - bn
20
  widget:
21
+ - source_sentence: दिवाळी आपण मोठ्या उत्साहाने साजरी करतो
22
  sentences:
23
+ - दिवाळी आपण आनंदाने साजरी करतो
24
+ - दिवाळी हा दिव्यांचा सण आहे
25
+ example_title: Monolingual- Marathi
26
+ - source_sentence: हम दीपावली उत्साह के साथ मनाते हैं
 
27
  sentences:
28
+ - हम दीपावली खुशियों से मनाते हैं
29
+ - दिवाली रोशनी का त्योहार है
30
+ example_title: Monolingual- Hindi
31
+ - source_sentence: અમે ઉત્સાહથી દિવાળી ઉજવીએ છીએ
 
32
  sentences:
33
+ - દિવાળી આપણે ખુશીઓથી ઉજવીએ છીએ
34
+ - દિવાળી એ રોશનીનો તહેવાર છે
35
+ example_title: Monolingual- Gujarati
36
+ - source_sentence: आम्हाला भारतीय असल्याचा अभिमान आहे
 
37
  sentences:
38
+ - हमें भारतीय होने पर गर्व है
39
+ - భారతీయులమైనందుకు గర్విస్తున్నాం
40
+ - અમને ભારતીય હોવાનો ગર્વ છે
41
+ example_title: Cross-lingual 1
42
+ - source_sentence: ਬਾਰਿਸ਼ ਤੋਂ ਬਾਅਦ ਬਗੀਚਾ ਸੁੰਦਰ ਦਿਖਾਈ ਦਿੰਦਾ ਹੈ
 
43
  sentences:
44
+ - മഴയ്ക്ക് ശേഷം പൂന്തോട്ടം മനോഹരമായി കാണപ്പെടുന്നു
45
+ - ବର୍ଷା ପରେ ବଗିଚା ସୁନ୍ଦର ଦେଖାଯାଏ |
46
+ - बारिश के बाद बगीचा सुंदर दिखता है
47
+ example_title: Cross-lingual 2
48
  ---
49
 
50
+ # IndicSBERT
51
+
52
+ This is a MURIL model (google/muril-base-cased) trained on the NLI dataset of ten major Indian Languages. <br>
53
+ The single model works for Hindi, Marathi, Kannada, Tamil, Telugu, Gujarati, Oriya, Punjabi, Malayalam, and Bengali.
54
+ The model also has cross-lingual capabilities. <br>
55
+ Released as a part of project MahaNLP: https://github.com/l3cube-pune/MarathiNLP <br>
56
 
57
+ A better sentence similarity model (fine-tuned version of this model) is shared here: https://huggingface.co/l3cube-pune/indic-sentence-similarity-sbert <br>
58
 
59
+ More details on the dataset, models, and baseline results can be found in our [paper] (https://arxiv.org/abs/2211.11187)
60
+
61
+ ```
62
+ @article{joshi2022l3cubemahasbert,
63
+ title={L3Cube-MahaSBERT and HindSBERT: Sentence BERT Models and Benchmarking BERT Sentence Representations for Hindi and Marathi},
64
+ author={Joshi, Ananya and Kajale, Aditi and Gadre, Janhavi and Deode, Samruddhi and Joshi, Raviraj},
65
+ journal={arXiv preprint arXiv:2211.11187},
66
+ year={2022}
67
+ }
68
+ ```
69
 
70
  ## Usage (Sentence-Transformers)
71
 
 
122
 
123
  print("Sentence embeddings:")
124
  print(sentence_embeddings)
125
+ ```