sdocio commited on
Commit
6fa208c
1 Parent(s): 0913595

Add README.md

Browse files
Files changed (1) hide show
  1. README.md +72 -0
README.md CHANGED
@@ -1,3 +1,75 @@
1
  ---
 
2
  license: gpl-3.0
 
 
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language: es
3
  license: gpl-3.0
4
+ tags:
5
+ - PyTorch
6
+ - Transformers
7
+ - Token Classification
8
+ - roberta
9
+ - roberta-base-bne
10
+ widget:
11
+ - text: "Fue antes de llegar a Sigüeiro, en el Camino de Santiago."
12
+ - text: "El proyecto lo financia el Ministerio de Industria y Competitividad."
13
+ model-index:
14
+ - name: roberta-bne-ner-cds
15
+ results: []
16
  ---
17
+
18
+ # Introduction
19
+
20
+ This model is a fine-tuned version of [roberta-base-bne](https://huggingface.co/PlanTL-GOB-ES/roberta-base-bne) for Named-Entity Recognition, in the domain of tourism related to the Way of Saint Jacques. It recognizes four types of entities: location (LOC), organizations (ORG), person (PER) and miscellaneous (MISC).
21
+
22
+ ## Usage
23
+
24
+ You can use this model with Transformers *pipeline* for NER.
25
+
26
+ ```python
27
+ from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline
28
+
29
+ tokenizer = AutoTokenizer.from_pretrained("roberta-bne-ner-cds")
30
+ model = AutoModelForTokenClassification.from_pretrained("roberta-bne-ner-cds")
31
+
32
+ example = "Fue antes de llegar a Sigüeiro, en el Camino de Santiago. El proyecto lo financia el Ministerio de Industria y Competitividad."
33
+ ner_pipe = pipeline('ner', model=model, tokenizer=tokenizer, aggregation_strategy="simple")
34
+
35
+ for ent in ner_pipe(example):
36
+ print(ent)
37
+ ```
38
+
39
+ ```
40
+ {'entity_group': 'LOC', 'score': 0.99795026, 'word': ' Sigüeiro', 'start': 22, 'end': 30}
41
+ {'entity_group': 'LOC', 'score': 0.997823, 'word': ' Camino de Santiago', 'start': 38, 'end': 56}
42
+ {'entity_group': 'ORG', 'score': 0.98481846, 'word': ' Ministerio de Industria y Competitividad', 'start': 85, 'end': 125}
43
+ ```
44
+
45
+ ## Model performance
46
+
47
+ entity|precision|recall|f1
48
+ -|-|-|-
49
+ PER|0.965|0.924|0.944
50
+ ORG|0.900|0.701|0.788
51
+ LOC|0.982|0.985|0.983
52
+ MISC|0.798|0.874|0.834
53
+ micro avg|0.964|0.968|0.966
54
+ macro avg|0.911|0.871|0.887
55
+ weighted avg|0.965|0.968|0.966
56
+
57
+ ## Training procedure
58
+
59
+ ### Training hyperparameters
60
+
61
+ The following hyperparameters were used during training:
62
+ - learning_rate: 5e-05
63
+ - train_batch_size: 32
64
+ - eval_batch_size: 8
65
+ - seed: 42
66
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
67
+ - lr_scheduler_type: linear
68
+ - num_epochs: 3.0
69
+
70
+ ### Framework versions
71
+
72
+ - Transformers 4.25.1
73
+ - Pytorch 1.13.0+cu117
74
+ - Datasets 2.7.1
75
+ - Tokenizers 0.13.2