Update README.md
Browse files
README.md
CHANGED
@@ -155,7 +155,7 @@ Our final models were trained on a different number of steps and sequence length
|
|
155 |
|
156 |
<figure>
|
157 |
|
158 |
-
<caption>Table 1. Evaluation made by the Barcelona Supercomputing Center of their models and BERTIN (beta, seq len 128).</caption>
|
159 |
|
160 |
| Dataset | Metric | RoBERTa-b | RoBERTa-l | BETO | mBERT | BERTIN |
|
161 |
|-------------|----------|-----------|-----------|--------|--------|--------|
|
@@ -170,11 +170,11 @@ Our final models were trained on a different number of steps and sequence length
|
|
170 |
|
171 |
</figure>
|
172 |
|
173 |
-
All of our models attained good accuracy values
|
174 |
|
175 |
<figure>
|
176 |
|
177 |
-
<caption>Table 2. Accuracy for the different language models.</caption>
|
178 |
|
179 |
| Model | Accuracy |
|
180 |
|----------------------------------------------------|----------|
|
@@ -187,6 +187,8 @@ All of our models attained good accuracy values, in the range of 0.65, as can be
|
|
187 |
|
188 |
</figure>
|
189 |
|
|
|
|
|
190 |
We are currently in the process of applying our language models to downstream tasks.
|
191 |
For simplicity, we will abbreviate the different models as follows:
|
192 |
* **BERT-m**: bert-base-multilingual-cased
|
@@ -202,55 +204,26 @@ For simplicity, we will abbreviate the different models as follows:
|
|
202 |
<figure>
|
203 |
|
204 |
<caption>
|
205 |
-
Table 3. Metrics for different downstream tasks, comparing our different models as well as other relevant BERT variations from the literature. Dataset for POS nad NER is CoNLL 2002.
|
206 |
</caption>
|
207 |
|
208 |
| Model | POS (F1/Acc) | NER (F1/Acc) | PAWS-X (Acc) | XNLI-256 (Acc) | XNLI-512 (Acc) |
|
209 |
|--------------|-------------------------|----------------------|--------------|--------------|--------------|
|
210 |
-
| BERT-m | 0.9629 / 0.9687 | 0.8539 / 0.9779 |
|
211 |
-
| BERT-wwm | 0.9642 / 0.9700 | 0.8579 / 0.9783 |
|
212 |
-
| BSC-BNE | 0.9659 / 0.9707 | 0.8700 / 0.9807 |
|
213 |
-
| Beta | 0.9638 / 0.9690 | 0.8725 / 0.9812 |
|
214 |
-
| Random | 0.9656 / 0.9704 | 0.8704 / 0.9807 |
|
215 |
-
| Stepwise | 0.9656 / 0.9707 | 0.8705 / 0.9809 |
|
216 |
-
| Gaussian | 0.9662 / 0.9709 | **0.8792 / 0.9816** |
|
217 |
-
| Random-512 | 0.9660 / 0.9707 | 0.8616 / 0.9803 |
|
218 |
-
| Gaussian-512 | **0.9662 / 0.9714** | **0.8764 / 0.9819** |
|
219 |
|
220 |
</figure>
|
221 |
|
|
|
222 |
|
223 |
-
|
224 |
-
Using sequence length 128 we have achieved exact match 50.96 and F1 68.74.
|
225 |
-
|
226 |
-
|
227 |
-
POS
|
228 |
-
All models trained with max length 512 and batch size 8, using the CoNLL 2002 dataset.
|
229 |
-
|
230 |
-
NER
|
231 |
-
All models trained with max length 512 and batch size 8, using the CoNLL 2002 dataset.
|
232 |
-
|
233 |
-
## PAWS-X
|
234 |
-
All models trained with max length 512 and batch size 8. These numbers are surprising both for the repeated instances of 0.5765 accuracy and for the large differences in performance. However, experiments have been repeated several times and the results are consistent.
|
235 |
-
|
236 |
-
<figure>
|
237 |
-
|
238 |
-
<caption>Table 5. Results for PAWS-X.</caption>
|
239 |
-
|
240 |
-
| Model | Accuracy |
|
241 |
-
|----------------------------------------------------|----------|
|
242 |
-
| bert-base-multilingual-cased | 0.5765 |
|
243 |
-
| dccuchile/bert-base-spanish-wwm-cased | 0.8720 |
|
244 |
-
| BSC-TeMU/roberta-base-bne | 0.5765 |
|
245 |
-
| bertin-project/bertin-roberta-base-spanish | 0.5765 |
|
246 |
-
| bertin-project/bertin-base-random | 0.8800 |
|
247 |
-
| bertin-project/bertin-base-stepwise | 0.8825 |
|
248 |
-
| bertin-project/bertin-base-gaussian | 0.8875 |
|
249 |
-
| bertin-project/bertin-base-random-exp-512seqlen | 0.6735 |
|
250 |
-
| bertin-project/bertin-base-gaussian-exp-512seqlen | **0.8965** |
|
251 |
-
|
252 |
-
</figure>
|
253 |
-
|
254 |
|
255 |
### XNLI
|
256 |
|
|
|
155 |
|
156 |
<figure>
|
157 |
|
158 |
+
<caption>Table 1. Evaluation made by the Barcelona Supercomputing Center of their models and BERTIN (beta, seq len 128), from their [preprint](https://arxiv.org/pdf/2107.07253.pdf).</caption>
|
159 |
|
160 |
| Dataset | Metric | RoBERTa-b | RoBERTa-l | BETO | mBERT | BERTIN |
|
161 |
|-------------|----------|-----------|-----------|--------|--------|--------|
|
|
|
170 |
|
171 |
</figure>
|
172 |
|
173 |
+
All of our models attained good accuracy values during training in the masked-language model task—in the range of 0.65—as can be seen in Table 2:
|
174 |
|
175 |
<figure>
|
176 |
|
177 |
+
<caption>Table 2. Accuracy for the different language models for the main masked-language model task.</caption>
|
178 |
|
179 |
| Model | Accuracy |
|
180 |
|----------------------------------------------------|----------|
|
|
|
187 |
|
188 |
</figure>
|
189 |
|
190 |
+
###Downstream Tasks
|
191 |
+
|
192 |
We are currently in the process of applying our language models to downstream tasks.
|
193 |
For simplicity, we will abbreviate the different models as follows:
|
194 |
* **BERT-m**: bert-base-multilingual-cased
|
|
|
204 |
<figure>
|
205 |
|
206 |
<caption>
|
207 |
+
Table 3. Metrics for different downstream tasks, comparing our different models as well as other relevant BERT variations from the literature. Dataset for POS nad NER is CoNLL 2002. POS, NER adn PAWS-X used max length 512 and batch size 8.
|
208 |
</caption>
|
209 |
|
210 |
| Model | POS (F1/Acc) | NER (F1/Acc) | PAWS-X (Acc) | XNLI-256 (Acc) | XNLI-512 (Acc) |
|
211 |
|--------------|-------------------------|----------------------|--------------|--------------|--------------|
|
212 |
+
| BERT-m | 0.9629 / 0.9687 | 0.8539 / 0.9779 | 0.5765 | | |
|
213 |
+
| BERT-wwm | 0.9642 / 0.9700 | 0.8579 / 0.9783 | 0.8720 | | |
|
214 |
+
| BSC-BNE | 0.9659 / 0.9707 | 0.8700 / 0.9807 | 0.5765 | | |
|
215 |
+
| Beta | 0.9638 / 0.9690 | 0.8725 / 0.9812 | 0.5765 | | |
|
216 |
+
| Random | 0.9656 / 0.9704 | 0.8704 / 0.9807 | 0.8800 | | |
|
217 |
+
| Stepwise | 0.9656 / 0.9707 | 0.8705 / 0.9809 | 0.8825 | | |
|
218 |
+
| Gaussian | 0.9662 / 0.9709 | **0.8792 / 0.9816** | 0.8875 | | |
|
219 |
+
| Random-512 | 0.9660 / 0.9707 | 0.8616 / 0.9803 | 0.6735 | | |
|
220 |
+
| Gaussian-512 | **0.9662 / 0.9714** | **0.8764 / 0.9819** | **0.8965** | | |
|
221 |
|
222 |
</figure>
|
223 |
|
224 |
+
In addition to the tasks above, we also trained the beta model on the SQUAD dataset, achieving exact match 50.96 and F1 68.74 (sequence length 128). A full evaluation of this task is still pending.
|
225 |
|
226 |
+
To note: not intense tuning, epochs, etc. Still, good?? PAWS-X: weird (large differences and repeated base value). Repeated and same, with minor differences.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
227 |
|
228 |
### XNLI
|
229 |
|