Update README.md
Browse files
README.md
CHANGED
@@ -7,4 +7,36 @@ tags:
|
|
7 |
license: cc-by-nc-3.0
|
8 |
---
|
9 |
|
10 |
-
# BERT-base-multilingual-cased finetuned for Part-of-Speech tagging
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
7 |
license: cc-by-nc-3.0
|
8 |
---
|
9 |
|
10 |
+
# BERT-base-multilingual-cased finetuned for Part-of-Speech tagging
|
11 |
+
|
12 |
+
This is a multilingual BERT model fine tuned for part-of-speech tagging for English. It is trained using the Penn TreeBank (Marcus et al., 1993) and achieves an F1-score of 96.69.
|
13 |
+
|
14 |
+
## Usage
|
15 |
+
A *transformers* pipeline can be used to run the model:
|
16 |
+
|
17 |
+
```python
|
18 |
+
from transformers import AutoTokenizer, AutoModelForTokenClassification, TokenClassificationPipeline
|
19 |
+
|
20 |
+
model_name = "QCRI/bert-base-multilingual-cased-pos-english"
|
21 |
+
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
22 |
+
model = AutoModelForTokenClassification.from_pretrained(model_name)
|
23 |
+
|
24 |
+
pipeline = TokenClassificationPipeline(model, tokenizer)
|
25 |
+
outputs = pipeline("A test example")
|
26 |
+
print(outputs)
|
27 |
+
```
|
28 |
+
|
29 |
+
|
30 |
+
## Citation
|
31 |
+
This model was used for all the part-of-speech tagging based results in *Analyzing Encoded Concepts in Transformer Language Models*, published at NAACL'22. If you find this model useful for your own work, please use the following citation:
|
32 |
+
|
33 |
+
```bib
|
34 |
+
@inproceedings{sajjad-NAACL,
|
35 |
+
title={Analyzing Encoded Concepts in Transformer Language Models},
|
36 |
+
author={Hassan Sajjad, Nadir Durrani, Fahim Dalvi, Firoj Alam, Abdul Rafae Khan and Jia Xu},
|
37 |
+
booktitle={North American Chapter of the Association of Computational Linguistics: Human Language Technologies (NAACL)},
|
38 |
+
series={NAACL~'22},
|
39 |
+
year={2022},
|
40 |
+
address={Seattle}
|
41 |
+
}
|
42 |
+
```
|