alvarobartt HF staff commited on
Commit
5f33c26
1 Parent(s): 21bdf0a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +76 -34
README.md CHANGED
@@ -1,4 +1,7 @@
1
  ---
 
 
 
2
  library_name: span-marker
3
  tags:
4
  - span-marker
@@ -6,34 +9,71 @@ tags:
6
  - ner
7
  - named-entity-recognition
8
  - generated_from_span_marker_trainer
 
 
9
  metrics:
10
  - precision
11
  - recall
12
  - f1
13
- widget: []
 
 
 
 
 
 
 
14
  pipeline_tag: token-classification
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
  ---
16
 
17
- # SpanMarker
18
 
19
- This is a [SpanMarker](https://github.com/tomaarsen/SpanMarkerNER) model that can be used for Named Entity Recognition.
20
 
21
  ## Model Details
22
 
23
  ### Model Description
24
  - **Model Type:** SpanMarker
25
- <!-- - **Encoder:** [Unknown](https://huggingface.co/unknown) -->
26
  - **Maximum Sequence Length:** 512 tokens
27
  - **Maximum Entity Length:** 8 words
28
- <!-- - **Training Dataset:** [Unknown](https://huggingface.co/datasets/unknown) -->
29
- <!-- - **Language:** Unknown -->
30
- <!-- - **License:** Unknown -->
31
 
32
  ### Model Sources
33
 
34
  - **Repository:** [SpanMarker on GitHub](https://github.com/tomaarsen/SpanMarkerNER)
35
  - **Thesis:** [SpanMarker For Named Entity Recognition](https://raw.githubusercontent.com/tomaarsen/SpanMarkerNER/main/thesis.pdf)
36
 
 
 
 
 
 
 
 
37
  ## Uses
38
 
39
  ### Direct Use for Inference
@@ -42,36 +82,11 @@ This is a [SpanMarker](https://github.com/tomaarsen/SpanMarkerNER) model that ca
42
  from span_marker import SpanMarkerModel
43
 
44
  # Download from the 🤗 Hub
45
- model = SpanMarkerModel.from_pretrained("span_marker_model_id")
46
  # Run inference
47
- entities = model.predict("Amelia Earhart flew her single engine Lockheed Vega 5B across the Atlantic to Paris.")
48
  ```
49
 
50
- ### Downstream Use
51
- You can finetune this model on your own dataset.
52
-
53
- <details><summary>Click to expand</summary>
54
-
55
- ```python
56
- from span_marker import SpanMarkerModel, Trainer
57
-
58
- # Download from the 🤗 Hub
59
- model = SpanMarkerModel.from_pretrained("span_marker_model_id")
60
-
61
- # Specify a Dataset with "tokens" and "ner_tag" columns
62
- dataset = load_dataset("conll2003") # For example CoNLL2003
63
-
64
- # Initialize a Trainer using the pretrained model & dataset
65
- trainer = Trainer(
66
- model=model,
67
- train_dataset=dataset["train"],
68
- eval_dataset=dataset["validation"],
69
- )
70
- trainer.train()
71
- trainer.save_model("span_marker_model_id-finetuned")
72
- ```
73
- </details>
74
-
75
  <!--
76
  ### Out-of-Scope Use
77
 
@@ -92,6 +107,33 @@ trainer.save_model("span_marker_model_id-finetuned")
92
 
93
  ## Training Details
94
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
95
  ### Framework Versions
96
  - Python: 3.10.12
97
  - SpanMarker: 1.3.1.dev
 
1
  ---
2
+ language:
3
+ - es
4
+ license: cc-by-4.0
5
  library_name: span-marker
6
  tags:
7
  - span-marker
 
9
  - ner
10
  - named-entity-recognition
11
  - generated_from_span_marker_trainer
12
+ datasets:
13
+ - xtreme
14
  metrics:
15
  - precision
16
  - recall
17
  - f1
18
+ widget:
19
+ - text: Con dicha agrupación compartió escenario con bandas y artistas como Hole,
20
+ Live y PJ Harvey.
21
+ - text: Jugaba como defensa y toda su trayectoria la hizo con el Deportivo Saprissa.
22
+ - text: Se encuentra en el Congo, Mozambique, Namibia, Tanzania, Uganda, Zimbabue.
23
+ - text: Fuchu-machi, Toyama-shi, Toyama-ku 939-2713, Honshū-jima, Japón.
24
+ - text: Fue protagonizado por Andrew McCarthy, Jonathan Silverman, Catherine Mary
25
+ Stewart y Terry Kiser.
26
  pipeline_tag: token-classification
27
+ base_model: bert-base-multilingual-cased
28
+ model-index:
29
+ - name: SpanMarker with bert-base-multilingual-cased on xtreme/PAN-X.es
30
+ results:
31
+ - task:
32
+ type: token-classification
33
+ name: Named Entity Recognition
34
+ dataset:
35
+ name: xtreme/PAN-X.es
36
+ type: xtreme
37
+ split: eval
38
+ metrics:
39
+ - type: f1
40
+ value: 0.9186626746506986
41
+ name: F1
42
+ - type: precision
43
+ value: 0.9231154938993816
44
+ name: Precision
45
+ - type: recall
46
+ value: 0.9142526071842411
47
+ name: Recall
48
  ---
49
 
50
+ # SpanMarker with bert-base-multilingual-cased on xtreme/PAN-X.es
51
 
52
+ This is a [SpanMarker](https://github.com/tomaarsen/SpanMarkerNER) model trained on the [xtreme/PAN-X.es](https://huggingface.co/datasets/xtreme) dataset that can be used for Named Entity Recognition. This SpanMarker model uses [bert-base-multilingual-cased](https://huggingface.co/bert-base-multilingual-cased) as the underlying encoder.
53
 
54
  ## Model Details
55
 
56
  ### Model Description
57
  - **Model Type:** SpanMarker
58
+ - **Encoder:** [bert-base-multilingual-cased](https://huggingface.co/bert-base-multilingual-cased)
59
  - **Maximum Sequence Length:** 512 tokens
60
  - **Maximum Entity Length:** 8 words
61
+ - **Training Dataset:** [xtreme/PAN-X.es](https://huggingface.co/datasets/xtreme)
62
+ - **Languages:** es
63
+ - **License:** cc-by-4.0
64
 
65
  ### Model Sources
66
 
67
  - **Repository:** [SpanMarker on GitHub](https://github.com/tomaarsen/SpanMarkerNER)
68
  - **Thesis:** [SpanMarker For Named Entity Recognition](https://raw.githubusercontent.com/tomaarsen/SpanMarkerNER/main/thesis.pdf)
69
 
70
+ ### Model Labels
71
+ | Label | Examples |
72
+ |:------|:------------------------------------------------------------------------------------|
73
+ | LOC | "Luanda", "Algarrobo ( Chile )", "Condado de Duplin" |
74
+ | ORG | "Società Sportiva Virtus Lanciano 1924", "Houses of the Holy", "Ejército del Norte" |
75
+ | PER | "W. G. Sebald", "Tamás Faragó", "José Luis García" |
76
+
77
  ## Uses
78
 
79
  ### Direct Use for Inference
 
82
  from span_marker import SpanMarkerModel
83
 
84
  # Download from the 🤗 Hub
85
+ model = SpanMarkerModel.from_pretrained("alvarobartt/bert-base-multilingual-cased-ner-spanish")
86
  # Run inference
87
+ entities = model.predict("Fuchu-machi, Toyama-shi, Toyama-ku 939-2713, Honshū-jima, Japón.")
88
  ```
89
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
90
  <!--
91
  ### Out-of-Scope Use
92
 
 
107
 
108
  ## Training Details
109
 
110
+ ### Training Set Metrics
111
+
112
+ | Training set | Min | Median | Max |
113
+ |:----------------------|:----|:-------|:----|
114
+ | Sentence length | 3 | 6.4642 | 64 |
115
+ | Entities per sentence | 1 | 1.2375 | 24 |
116
+
117
+ ### Training Hyperparameters
118
+
119
+ - learning_rate: 5e-05
120
+ - train_batch_size: 8
121
+ - eval_batch_size: 4
122
+ - seed: 42
123
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
124
+ - lr_scheduler_type: linear
125
+ - lr_scheduler_warmup_ratio: 0.1
126
+ - num_epochs: 2
127
+
128
+ ### Training Results
129
+ | Epoch | Step | Validation Loss | Validation Precision | Validation Recall | Validation F1 | Validation Accuracy |
130
+ |:------:|:----:|:---------------:|:--------------------:|:-----------------:|:-------------:|:-------------------:|
131
+ | 0.3998 | 1000 | 0.0388 | 0.8761 | 0.8641 | 0.8701 | 0.9223 |
132
+ | 0.7997 | 2000 | 0.0326 | 0.8995 | 0.8740 | 0.8866 | 0.9341 |
133
+ | 1.1995 | 3000 | 0.0277 | 0.9076 | 0.9019 | 0.9047 | 0.9424 |
134
+ | 1.5994 | 4000 | 0.0261 | 0.9143 | 0.9113 | 0.9128 | 0.9473 |
135
+ | 1.9992 | 5000 | 0.0234 | 0.9231 | 0.9143 | 0.9187 | 0.9502 |
136
+
137
  ### Framework Versions
138
  - Python: 3.10.12
139
  - SpanMarker: 1.3.1.dev