diyclassics commited on
Commit
c74d2a2
1 Parent(s): 74f9469

Update spaCy pipeline

Browse files
README.md CHANGED
@@ -14,72 +14,72 @@ model-index:
14
  metrics:
15
  - name: NER Precision
16
  type: precision
17
- value: 0.9546354339
18
  - name: NER Recall
19
  type: recall
20
- value: 0.9408226305
21
  - name: NER F Score
22
  type: f_score
23
- value: 0.947678703
24
  - task:
25
  name: TAG
26
  type: token-classification
27
  metrics:
28
  - name: TAG (XPOS) Accuracy
29
  type: accuracy
30
- value: 0.9649296122
31
  - task:
32
  name: POS
33
  type: token-classification
34
  metrics:
35
  - name: POS (UPOS) Accuracy
36
  type: accuracy
37
- value: 0.9829896358
38
  - task:
39
  name: MORPH
40
  type: token-classification
41
  metrics:
42
  - name: Morph (UFeats) Accuracy
43
  type: accuracy
44
- value: 0.9568359544
45
  - task:
46
  name: LEMMA
47
  type: token-classification
48
  metrics:
49
  - name: Lemma Accuracy
50
  type: accuracy
51
- value: 0.9525518858
52
  - task:
53
  name: UNLABELED_DEPENDENCIES
54
  type: token-classification
55
  metrics:
56
  - name: Unlabeled Attachment Score (UAS)
57
  type: f_score
58
- value: 0.8847980068
59
  - task:
60
  name: LABELED_DEPENDENCIES
61
  type: token-classification
62
  metrics:
63
  - name: Labeled Attachment Score (LAS)
64
  type: f_score
65
- value: 0.844676641
66
  - task:
67
  name: SENTS
68
  type: token-classification
69
  metrics:
70
  - name: Sentences F-Score
71
  type: f_score
72
- value: 0.9438152831
73
  ---
74
  | Feature | Description |
75
  | --- | --- |
76
  | **Name** | `la_core_web_trf` |
77
- | **Version** | `3.7.6` |
78
- | **spaCy** | `>=3.7.4,<3.8.0` |
79
- | **Default Pipeline** | `transformer`, `normer`, `tagger`, `morphologizer`, `trainable_lemmatizer`, `parser`, `lookup_lemmatizer`, `ner` |
80
- | **Components** | `transformer`, `normer`, `tagger`, `morphologizer`, `trainable_lemmatizer`, `parser`, `lookup_lemmatizer`, `ner` |
81
  | **Vectors** | 0 keys, 0 unique vectors (0 dimensions) |
82
- | **Sources** | UD_Latin-Perseus<br>UD_Latin-PROIEL<br>UD_Latin-ITTB<br>UD_Latin-LLCT<br>UD_Latin-UDante |
83
  | **License** | `MIT` |
84
  | **Author** | [Patrick J. Burns; with Nora Bernhardt [ner], Tim Geelhaar [tagger, morphologizer, parser, ner], Vincent Koch [ner]](https://diyclassics.github.io/) |
85
 
@@ -102,21 +102,21 @@ model-index:
102
 
103
  | Type | Score |
104
  | --- | --- |
105
- | `ENTS_F` | 94.98 |
106
- | `ENTS_P` | 94.10 |
107
- | `ENTS_R` | 95.89 |
108
- | `TRANSFORMER_LOSS` | 3954242.59 |
109
- | `NER_LOSS` | 8199.98 |
110
- | `TAG_ACC` | 96.49 |
111
- | `POS_ACC` | 98.30 |
112
- | `MORPH_ACC` | 95.68 |
113
- | `LEMMA_ACC` | 95.26 |
114
- | `DEP_UAS` | 88.48 |
115
- | `DEP_LAS` | 84.47 |
116
- | `SENTS_P` | 94.66 |
117
- | `SENTS_R` | 94.10 |
118
- | `SENTS_F` | 94.38 |
119
- | `TAGGER_LOSS` | 41020.70 |
120
- | `MORPHOLOGIZER_LOSS` | 348603.00 |
121
- | `TRAINABLE_LEMMATIZER_LOSS` | 349936.53 |
122
- | `PARSER_LOSS` | 2759001.09 |
 
14
  metrics:
15
  - name: NER Precision
16
  type: precision
17
+ value: 0.9631931948
18
  - name: NER Recall
19
  type: recall
20
+ value: 0.9570871261
21
  - name: NER F Score
22
  type: f_score
23
+ value: 0.9601304525
24
  - task:
25
  name: TAG
26
  type: token-classification
27
  metrics:
28
  - name: TAG (XPOS) Accuracy
29
  type: accuracy
30
+ value: 0.9640958514
31
  - task:
32
  name: POS
33
  type: token-classification
34
  metrics:
35
  - name: POS (UPOS) Accuracy
36
  type: accuracy
37
+ value: 0.9831838987
38
  - task:
39
  name: MORPH
40
  type: token-classification
41
  metrics:
42
  - name: Morph (UFeats) Accuracy
43
  type: accuracy
44
+ value: 0.9581374663
45
  - task:
46
  name: LEMMA
47
  type: token-classification
48
  metrics:
49
  - name: Lemma Accuracy
50
  type: accuracy
51
+ value: 0.9531809911
52
  - task:
53
  name: UNLABELED_DEPENDENCIES
54
  type: token-classification
55
  metrics:
56
  - name: Unlabeled Attachment Score (UAS)
57
  type: f_score
58
+ value: 0.8882308136
59
  - task:
60
  name: LABELED_DEPENDENCIES
61
  type: token-classification
62
  metrics:
63
  - name: Labeled Attachment Score (LAS)
64
  type: f_score
65
+ value: 0.8492401865
66
  - task:
67
  name: SENTS
68
  type: token-classification
69
  metrics:
70
  - name: Sentences F-Score
71
  type: f_score
72
+ value: 0.9959496442
73
  ---
74
  | Feature | Description |
75
  | --- | --- |
76
  | **Name** | `la_core_web_trf` |
77
+ | **Version** | `3.7.7` |
78
+ | **spaCy** | `>=3.7.5,<3.8.0` |
79
+ | **Default Pipeline** | `senter`, `transformer`, `normer`, `tagger`, `morphologizer`, `trainable_lemmatizer`, `parser`, `lookup_lemmatizer`, `ner` |
80
+ | **Components** | `senter`, `transformer`, `normer`, `tagger`, `morphologizer`, `trainable_lemmatizer`, `parser`, `lookup_lemmatizer`, `ner` |
81
  | **Vectors** | 0 keys, 0 unique vectors (0 dimensions) |
82
+ | **Sources** | UD_Latin-Perseus (via Gamba/Zeman 2023)<br>UD_Latin-PROIEL (via Gamba/Zeman 2023)<br>UD_Latin-ITTB (via Gamba/Zeman 2023)<br>UD_Latin-LLCT (via Gamba/Zeman 2023<br>UD_Latin-UDante (via Gamba/Zeman 2023)<br>CIRCSE/LASLA: LASLA Corpus |
83
  | **License** | `MIT` |
84
  | **Author** | [Patrick J. Burns; with Nora Bernhardt [ner], Tim Geelhaar [tagger, morphologizer, parser, ner], Vincent Koch [ner]](https://diyclassics.github.io/) |
85
 
 
102
 
103
  | Type | Score |
104
  | --- | --- |
105
+ | `ENTS_F` | 95.43 |
106
+ | `ENTS_P` | 94.87 |
107
+ | `ENTS_R` | 95.99 |
108
+ | `TRANSFORMER_LOSS` | 3054585.96 |
109
+ | `NER_LOSS` | 9051.00 |
110
+ | `TAG_ACC` | 96.33 |
111
+ | `POS_ACC` | 98.31 |
112
+ | `MORPH_ACC` | 95.75 |
113
+ | `LEMMA_ACC` | 95.27 |
114
+ | `DEP_UAS` | 88.61 |
115
+ | `DEP_LAS` | 84.72 |
116
+ | `SENTS_P` | 94.93 |
117
+ | `SENTS_R` | 94.41 |
118
+ | `SENTS_F` | 94.67 |
119
+ | `TAGGER_LOSS` | 28864.85 |
120
+ | `MORPHOLOGIZER_LOSS` | 246784.11 |
121
+ | `TRAINABLE_LEMMATIZER_LOSS` | 242230.83 |
122
+ | `PARSER_LOSS` | 2414158.66 |
config.cfg CHANGED
@@ -10,7 +10,7 @@ seed = 0
10
 
11
  [nlp]
12
  lang = "la"
13
- pipeline = ["transformer","normer","tagger","morphologizer","trainable_lemmatizer","parser","lookup_lemmatizer","ner"]
14
  tokenizer = {"@tokenizers":"spacy.Tokenizer.v1"}
15
  disabled = []
16
  before_creation = null
@@ -129,6 +129,26 @@ use_fast = true
129
 
130
  [components.parser.model.tok2vec.transformer_config]
131
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
132
  [components.tagger]
133
  factory = "tagger"
134
  label_smoothing = 0.0
@@ -251,6 +271,9 @@ eps = 0.00000001
251
  learn_rate = 0.001
252
 
253
  [training.score_weights]
 
 
 
254
  tag_acc = 0.2
255
  pos_acc = 0.1
256
  morph_acc = 0.1
@@ -259,9 +282,6 @@ lemma_acc = 0.2
259
  dep_uas = 0.1
260
  dep_las = 0.1
261
  dep_las_per_type = null
262
- sents_p = null
263
- sents_r = null
264
- sents_f = 0.0
265
  ents_f = 0.2
266
  ents_p = 0.0
267
  ents_r = 0.0
 
10
 
11
  [nlp]
12
  lang = "la"
13
+ pipeline = ["senter","transformer","normer","tagger","morphologizer","trainable_lemmatizer","parser","lookup_lemmatizer","ner"]
14
  tokenizer = {"@tokenizers":"spacy.Tokenizer.v1"}
15
  disabled = []
16
  before_creation = null
 
129
 
130
  [components.parser.model.tok2vec.transformer_config]
131
 
132
+ [components.senter]
133
+ factory = "senter"
134
+ overwrite = false
135
+ scorer = {"@scorers":"spacy.senter_scorer.v1"}
136
+
137
+ [components.senter.model]
138
+ @architectures = "spacy.Tagger.v2"
139
+ nO = null
140
+ normalize = false
141
+
142
+ [components.senter.model.tok2vec]
143
+ @architectures = "spacy.HashEmbedCNN.v2"
144
+ pretrained_vectors = null
145
+ width = 12
146
+ depth = 1
147
+ embed_size = 2000
148
+ window_size = 1
149
+ maxout_pieces = 2
150
+ subword_features = true
151
+
152
  [components.tagger]
153
  factory = "tagger"
154
  label_smoothing = 0.0
 
271
  learn_rate = 0.001
272
 
273
  [training.score_weights]
274
+ sents_f = 0.0
275
+ sents_p = null
276
+ sents_r = null
277
  tag_acc = 0.2
278
  pos_acc = 0.1
279
  morph_acc = 0.1
 
282
  dep_uas = 0.1
283
  dep_las = 0.1
284
  dep_las_per_type = null
 
 
 
285
  ents_f = 0.2
286
  ents_p = 0.0
287
  ents_r = 0.0
functions.py CHANGED
@@ -200,11 +200,8 @@ import string
200
  blank_nlp = spacy.blank("la")
201
  lookups = Lookups()
202
 
203
- try:
204
- lookups_data = load_lookups(lang=blank_nlp.vocab.lang, tables=["lemma_lookup"])
205
- except:
206
- lookups_data = lookups.from_disk("scripts/lemmatizer_lookups")
207
 
 
208
  LOOKUPS = lookups_data.get_table("lemma_lookup")
209
 
210
  predicted_lemma_getter = lambda token: token.lemma_
 
200
  blank_nlp = spacy.blank("la")
201
  lookups = Lookups()
202
 
 
 
 
 
203
 
204
+ lookups_data = load_lookups(lang=blank_nlp.vocab.lang, tables=["lemma_lookup"])
205
  LOOKUPS = lookups_data.get_table("lemma_lookup")
206
 
207
  predicted_lemma_getter = lambda token: token.lemma_
la_core_web_trf-any-py3-none-any.whl CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:277475d238ab6b4a9b70acee5cf3a8dfaefae220ba9ab6c8c674b757ea1b897c
3
- size 2526069269
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:391c1897518a0049a9648a7ae67f931c778bec052dd7a5d98cb012cab57a9929
3
+ size 2509328343
meta.json CHANGED
@@ -1,14 +1,14 @@
1
  {
2
  "lang":"la",
3
  "name":"core_web_trf",
4
- "version":"3.7.6",
5
  "description":"",
6
  "author":"Patrick J. Burns; with Nora Bernhardt [ner], Tim Geelhaar [tagger, morphologizer, parser, ner], Vincent Koch [ner]",
7
  "email":"[email protected]",
8
  "url":"https://diyclassics.github.io/",
9
  "license":"MIT",
10
- "spacy_version":">=3.7.4,<3.8.0",
11
- "spacy_git_version":"bff8725f4",
12
  "vectors":{
13
  "width":0,
14
  "vectors":0,
@@ -811,6 +811,7 @@
811
  ]
812
  },
813
  "pipeline":[
 
814
  "transformer",
815
  "normer",
816
  "tagger",
@@ -821,6 +822,7 @@
821
  "ner"
822
  ],
823
  "components":[
 
824
  "transformer",
825
  "normer",
826
  "tagger",
@@ -834,301 +836,304 @@
834
 
835
  ],
836
  "performance":{
837
- "ents_f":0.947678703,
838
- "ents_p":0.9546354339,
839
- "ents_r":0.9408226305,
840
  "ents_per_type":{
841
  "PERSON":{
842
- "p":0.9693645208,
843
- "r":0.9541564792,
844
- "f":0.9617003799
845
  },
846
  "LOC":{
847
- "p":0.8880994671,
848
- "r":0.8833922261,
849
- "f":0.8857395926
850
  },
851
  "NORP":{
852
- "p":0.9904761905,
853
  "r":0.9369369369,
854
- "f":0.962962963
855
  }
856
  },
857
- "transformer_loss":39542.4259041837,
858
- "ner_loss":86.9198837828,
859
- "tag_acc":0.9649296122,
860
- "pos_acc":0.9829896358,
861
- "morph_acc":0.9568359544,
 
 
 
862
  "morph_per_feat":{
863
  "Case":{
864
- "p":0.9621140517,
865
- "r":0.9579640812,
866
- "f":0.9600345817
867
  },
868
  "Gender":{
869
- "p":0.96884072,
870
- "r":0.9651582339,
871
- "f":0.9669959711
872
  },
873
  "Number":{
874
- "p":0.9882914319,
875
- "r":0.98518503,
876
- "f":0.9867357861
877
  },
878
  "Mood":{
879
- "p":0.9840446359,
880
- "r":0.982505864,
881
- "f":0.9832746479
882
  },
883
  "Person":{
884
- "p":0.9934610355,
885
- "r":0.9920872705,
886
- "f":0.9927736777
887
  },
888
  "Tense":{
889
- "p":0.976046915,
890
- "r":0.9767730203,
891
- "f":0.9764098327
892
  },
893
  "VerbForm":{
894
- "p":0.9853009644,
895
- "r":0.9810596026,
896
- "f":0.9831757093
897
  },
898
  "Voice":{
899
- "p":0.9817962212,
900
- "r":0.9851960869,
901
- "f":0.9834932158
902
  }
903
  },
904
- "lemma_acc":0.9525518858,
905
- "dep_uas":0.8847980068,
906
- "dep_las":0.844676641,
907
  "dep_las_per_type":{
908
  "root":{
909
- "p":0.9244532803,
910
- "r":0.9175619382,
911
- "f":0.9209947183
912
  },
913
  "cop":{
914
- "p":0.8362790698,
915
- "r":0.8565983802,
916
- "f":0.8463167804
917
  },
918
  "nsubj":{
919
- "p":0.8507099391,
920
- "r":0.8474439281,
921
- "f":0.8490737929
922
  },
923
  "nmod":{
924
- "p":0.8432888264,
925
- "r":0.831792976,
926
- "f":0.837501454
927
  },
928
  "obj":{
929
- "p":0.8534423995,
930
- "r":0.8534423995,
931
- "f":0.8534423995
932
  },
933
  "det":{
934
- "p":0.9289997601,
935
- "r":0.9339281408,
936
- "f":0.9314574315
937
  },
938
  "cc":{
939
- "p":0.9029636711,
940
- "r":0.9147699758,
941
- "f":0.9088284821
942
  },
943
  "conj":{
944
- "p":0.7633358377,
945
- "r":0.765060241,
946
- "f":0.7641970666
947
  },
948
  "nummod":{
949
- "p":0.9273504274,
950
- "r":0.9333333333,
951
- "f":0.9303322615
952
  },
953
  "case":{
954
- "p":0.9754917768,
955
- "r":0.9819834442,
956
- "f":0.9787268462
957
  },
958
  "obl":{
959
- "p":0.7992770167,
960
- "r":0.8219526511,
961
- "f":0.8104562554
962
  },
963
  "acl":{
964
- "p":0.7413308341,
965
- "r":0.6743393009,
966
- "f":0.70625
967
  },
968
  "ccomp":{
969
- "p":0.6262254902,
970
- "r":0.641959799,
971
- "f":0.6339950372
972
  },
973
  "acl:relcl":{
974
- "p":0.7034617897,
975
- "r":0.6988968202,
976
- "f":0.701171875
977
  },
978
  "advmod":{
979
- "p":0.7974038016,
980
- "r":0.7992565056,
981
- "f":0.7983290787
982
  },
983
  "mark":{
984
- "p":0.8836696091,
985
- "r":0.8751170777,
986
- "f":0.879372549
987
  },
988
  "xcomp":{
989
- "p":0.8135860979,
990
- "r":0.7779456193,
991
- "f":0.7953667954
992
  },
993
  "csubj:pass":{
994
- "p":0.7046632124,
995
- "r":0.6476190476,
996
- "f":0.6749379653
997
  },
998
  "advmod:lmod":{
999
- "p":0.896797153,
1000
- "r":0.8600682594,
1001
- "f":0.8780487805
1002
  },
1003
  "obl:arg":{
1004
- "p":0.8137304392,
1005
- "r":0.8043912176,
1006
- "f":0.809033877
1007
  },
1008
  "csubj":{
1009
- "p":0.7334337349,
1010
- "r":0.763322884,
1011
- "f":0.7480798771
1012
  },
1013
  "discourse":{
1014
- "p":0.8848829855,
1015
- "r":0.8876903553,
1016
- "f":0.8862844473
1017
  },
1018
  "advcl":{
1019
- "p":0.6790653314,
1020
- "r":0.6966731898,
1021
- "f":0.6877565805
1022
  },
1023
  "nsubj:pass":{
1024
- "p":0.8203125,
1025
- "r":0.843373494,
1026
- "f":0.8316831683
1027
  },
1028
  "advmod:tmod":{
1029
- "p":0.7389380531,
1030
- "r":0.7660550459,
1031
- "f":0.7522522523
1032
  },
1033
  "advmod:emph":{
1034
- "p":0.7091836735,
1035
- "r":0.695,
1036
- "f":0.702020202
1037
  },
1038
  "amod":{
1039
- "p":0.8843806104,
1040
  "r":0.8875675676,
1041
- "f":0.885971223
1042
  },
1043
  "conj:expl":{
1044
- "p":0.4375,
1045
- "r":0.3010752688,
1046
- "f":0.3566878981
1047
  },
1048
  "advmod:neg":{
1049
- "p":0.891634981,
1050
- "r":0.8933333333,
1051
- "f":0.8924833492
1052
  },
1053
  "advcl:cmp":{
1054
- "p":0.6405919662,
1055
- "r":0.6084337349,
1056
- "f":0.6240988671
1057
  },
1058
  "nsubj:outer":{
1059
- "p":0.3333333333,
1060
  "r":0.3684210526,
1061
- "f":0.35
 
 
 
 
 
1062
  },
1063
  "advcl:abs":{
1064
- "p":0.8487394958,
1065
- "r":0.8440111421,
1066
- "f":0.8463687151
1067
  },
1068
  "aux:pass":{
1069
- "p":0.9431438127,
1070
  "r":0.9575551783,
1071
- "f":0.950294861
 
 
 
 
 
 
 
 
 
 
1072
  },
1073
  "dep":{
1074
  "p":0.0,
1075
  "r":0.0,
1076
  "f":0.0
1077
  },
1078
- "advcl:pred":{
1079
- "p":0.3203883495,
1080
- "r":0.1843575419,
1081
- "f":0.2340425532
1082
- },
1083
- "orphan":{
1084
- "p":0.5263157895,
1085
- "r":0.3703703704,
1086
- "f":0.4347826087
1087
- },
1088
  "aux":{
1089
- "p":0.8865248227,
1090
- "r":0.8992805755,
1091
- "f":0.8928571429
1092
  },
1093
  "appos":{
1094
- "p":0.9255813953,
1095
- "r":0.8917102315,
1096
- "f":0.9083301636
1097
- },
1098
- "fixed":{
1099
- "p":0.955,
1100
- "r":0.8842592593,
1101
- "f":0.9182692308
1102
  },
1103
  "parataxis":{
1104
- "p":0.4705882353,
1105
- "r":0.3333333333,
1106
- "f":0.3902439024
 
 
 
 
 
1107
  },
1108
  "flat":{
1109
- "p":0.8398268398,
1110
- "r":0.7519379845,
1111
- "f":0.7934560327
1112
  },
1113
  "vocative":{
1114
- "p":0.6862745098,
1115
  "r":0.5737704918,
1116
- "f":0.625
1117
- },
1118
- "dislocated":{
1119
- "p":0.3333333333,
1120
- "r":0.0967741935,
1121
- "f":0.15
1122
  },
1123
- "dislocated:obj":{
1124
- "p":0.7978723404,
1125
- "r":0.7731958763,
1126
- "f":0.7853403141
1127
  },
1128
  "reparandum":{
1129
- "p":0.2,
1130
  "r":0.0833333333,
1131
- "f":0.1176470588
1132
  },
1133
  "dislocated:nsubj":{
1134
  "p":0.0,
@@ -1140,35 +1145,35 @@
1140
  "r":0.0,
1141
  "f":0.0
1142
  },
 
 
 
 
 
1143
  "obl:agent":{
1144
- "p":0.5344827586,
1145
- "r":0.3069306931,
1146
- "f":0.3899371069
1147
  },
1148
  "flat:name":{
1149
- "p":0.7428571429,
1150
  "r":0.8965517241,
1151
- "f":0.8125
1152
- },
1153
- "flat:foreign":{
1154
- "p":0.375,
1155
- "r":1.0,
1156
- "f":0.5454545455
1157
  },
1158
  "obl:tmod":{
1159
- "p":0.1666666667,
1160
  "r":0.125,
1161
- "f":0.1428571429
1162
  },
1163
- "obl:lmod":{
1164
  "p":0.0,
1165
  "r":0.0,
1166
  "f":0.0
1167
  },
1168
- "ccomp:reported":{
1169
- "p":0.3846153846,
1170
- "r":0.4347826087,
1171
- "f":0.4081632653
1172
  },
1173
  "parataxis:reporting":{
1174
  "p":0.0,
@@ -1206,20 +1211,20 @@
1206
  "f":0.0
1207
  }
1208
  },
1209
- "sents_p":0.9466254963,
1210
- "sents_r":0.9410217058,
1211
- "sents_f":0.9438152831,
1212
- "tagger_loss":410.2069541283,
1213
- "morphologizer_loss":3486.0300176998,
1214
- "trainable_lemmatizer_loss":3499.3652743734,
1215
- "parser_loss":27590.0109430107
1216
  },
1217
  "sources":[
1218
- "UD_Latin-Perseus",
1219
- "UD_Latin-PROIEL",
1220
- "UD_Latin-ITTB",
1221
- "UD_Latin-LLCT",
1222
- "UD_Latin-UDante"
 
 
1223
  ],
1224
  "requirements":[
1225
  "spacy_lookups_data @ git+https://github.com/diyclassics/spacy-lookups-data.git#egg=spacy-lookups-data",
 
1
  {
2
  "lang":"la",
3
  "name":"core_web_trf",
4
+ "version":"3.7.7",
5
  "description":"",
6
  "author":"Patrick J. Burns; with Nora Bernhardt [ner], Tim Geelhaar [tagger, morphologizer, parser, ner], Vincent Koch [ner]",
7
  "email":"[email protected]",
8
  "url":"https://diyclassics.github.io/",
9
  "license":"MIT",
10
+ "spacy_version":">=3.7.5,<3.8.0",
11
+ "spacy_git_version":"a6d0fc360",
12
  "vectors":{
13
  "width":0,
14
  "vectors":0,
 
811
  ]
812
  },
813
  "pipeline":[
814
+ "senter",
815
  "transformer",
816
  "normer",
817
  "tagger",
 
822
  "ner"
823
  ],
824
  "components":[
825
+ "senter",
826
  "transformer",
827
  "normer",
828
  "tagger",
 
836
 
837
  ],
838
  "performance":{
839
+ "ents_f":0.9601304525,
840
+ "ents_p":0.9631931948,
841
+ "ents_r":0.9570871261,
842
  "ents_per_type":{
843
  "PERSON":{
844
+ "p":0.966012543,
845
+ "r":0.9727031982,
846
+ "f":0.9693463256
847
  },
848
  "LOC":{
849
+ "p":0.9465290807,
850
+ "r":0.8913427562,
851
+ "f":0.9181073703
852
  },
853
  "NORP":{
854
+ "p":1.0,
855
  "r":0.9369369369,
856
+ "f":0.9674418605
857
  }
858
  },
859
+ "transformer_loss":11904.1139860171,
860
+ "ner_loss":56.6517711698,
861
+ "sents_f":0.9959496442,
862
+ "sents_p":0.9945343244,
863
+ "sents_r":0.997368998,
864
+ "tag_acc":0.9640958514,
865
+ "pos_acc":0.9831838987,
866
+ "morph_acc":0.9581374663,
867
  "morph_per_feat":{
868
  "Case":{
869
+ "p":0.9639107612,
870
+ "r":0.9600554205,
871
+ "f":0.9619792281
872
  },
873
  "Gender":{
874
+ "p":0.9704207442,
875
+ "r":0.9668658936,
876
+ "f":0.9686400574
877
  },
878
  "Number":{
879
+ "p":0.9882738621,
880
+ "r":0.985457441,
881
+ "f":0.9868636421
882
  },
883
  "Mood":{
884
+ "p":0.9832762836,
885
+ "r":0.9826035966,
886
+ "f":0.982939825
887
  },
888
  "Person":{
889
+ "p":0.9932344122,
890
+ "r":0.9924713836,
891
+ "f":0.9928527513
892
  },
893
  "Tense":{
894
+ "p":0.976521164,
895
+ "r":0.9763597289,
896
+ "f":0.9764404398
897
  },
898
  "VerbForm":{
899
+ "p":0.9860205033,
900
+ "r":0.9809271523,
901
+ "f":0.9834672333
902
  },
903
  "Voice":{
904
+ "p":0.9829952525,
905
+ "r":0.9858886676,
906
+ "f":0.984439834
907
  }
908
  },
909
+ "lemma_acc":0.9531809911,
910
+ "dep_uas":0.8882308136,
911
+ "dep_las":0.8492401865,
912
  "dep_las_per_type":{
913
  "root":{
914
+ "p":0.931788369,
915
+ "r":0.9344442008,
916
+ "f":0.9331143952
917
  },
918
  "cop":{
919
+ "p":0.8331784387,
920
+ "r":0.8542162935,
921
+ "f":0.8435662197
922
  },
923
  "nsubj":{
924
+ "p":0.8516872095,
925
+ "r":0.8516872095,
926
+ "f":0.8516872095
927
  },
928
  "nmod":{
929
+ "p":0.8462984724,
930
+ "r":0.8320240296,
931
+ "f":0.8391005476
932
  },
933
  "obj":{
934
+ "p":0.8521505376,
935
+ "r":0.8643490116,
936
+ "f":0.8582064298
937
  },
938
  "det":{
939
+ "p":0.9342518733,
940
+ "r":0.9319990354,
941
+ "f":0.9331240946
942
  },
943
  "cc":{
944
+ "p":0.9080487222,
945
+ "r":0.9205811138,
946
+ "f":0.9142719731
947
  },
948
  "conj":{
949
+ "p":0.7781139687,
950
+ "r":0.7745983936,
951
+ "f":0.7763522013
952
  },
953
  "nummod":{
954
+ "p":0.9216101695,
955
+ "r":0.935483871,
956
+ "f":0.9284951974
957
  },
958
  "case":{
959
+ "p":0.9727039178,
960
+ "r":0.9832819348,
961
+ "f":0.9779643232
962
  },
963
  "obl":{
964
+ "p":0.8034008407,
965
+ "r":0.8227352769,
966
+ "f":0.8129531174
967
  },
968
  "acl":{
969
+ "p":0.7437788018,
970
+ "r":0.6879795396,
971
+ "f":0.7147918512
972
  },
973
  "ccomp":{
974
+ "p":0.6428571429,
975
+ "r":0.6331658291,
976
+ "f":0.6379746835
977
  },
978
  "acl:relcl":{
979
+ "p":0.7105263158,
980
+ "r":0.700843608,
981
+ "f":0.7056517478
982
  },
983
  "advmod":{
984
+ "p":0.7999080037,
985
+ "r":0.8080855019,
986
+ "f":0.8039759593
987
  },
988
  "mark":{
989
+ "p":0.8895979579,
990
+ "r":0.8704339682,
991
+ "f":0.8799116301
992
  },
993
  "xcomp":{
994
+ "p":0.795045045,
995
+ "r":0.7998489426,
996
+ "f":0.797439759
997
  },
998
  "csubj:pass":{
999
+ "p":0.7127659574,
1000
+ "r":0.6380952381,
1001
+ "f":0.6733668342
1002
  },
1003
  "advmod:lmod":{
1004
+ "p":0.9055944056,
1005
+ "r":0.8839590444,
1006
+ "f":0.8946459413
1007
  },
1008
  "obl:arg":{
1009
+ "p":0.8297546012,
1010
+ "r":0.8098802395,
1011
+ "f":0.8196969697
1012
  },
1013
  "csubj":{
1014
+ "p":0.7437106918,
1015
+ "r":0.7413793103,
1016
+ "f":0.7425431711
1017
  },
1018
  "discourse":{
1019
+ "p":0.8915662651,
1020
+ "r":0.8921319797,
1021
+ "f":0.8918490327
1022
  },
1023
  "advcl":{
1024
+ "p":0.6983372922,
1025
+ "r":0.7191780822,
1026
+ "f":0.708604483
1027
  },
1028
  "nsubj:pass":{
1029
+ "p":0.8316205534,
1030
+ "r":0.8449799197,
1031
+ "f":0.838247012
1032
  },
1033
  "advmod:tmod":{
1034
+ "p":0.7633928571,
1035
+ "r":0.7844036697,
1036
+ "f":0.7737556561
1037
  },
1038
  "advmod:emph":{
1039
+ "p":0.72,
1040
+ "r":0.69,
1041
+ "f":0.7046808511
1042
  },
1043
  "amod":{
1044
+ "p":0.8834289813,
1045
  "r":0.8875675676,
1046
+ "f":0.8854934388
1047
  },
1048
  "conj:expl":{
1049
+ "p":0.4909090909,
1050
+ "r":0.2903225806,
1051
+ "f":0.3648648649
1052
  },
1053
  "advmod:neg":{
1054
+ "p":0.8902554399,
1055
+ "r":0.8961904762,
1056
+ "f":0.8932130992
1057
  },
1058
  "advcl:cmp":{
1059
+ "p":0.6553191489,
1060
+ "r":0.6184738956,
1061
+ "f":0.6363636364
1062
  },
1063
  "nsubj:outer":{
1064
+ "p":0.35,
1065
  "r":0.3684210526,
1066
+ "f":0.358974359
1067
+ },
1068
+ "advcl:pred":{
1069
+ "p":0.329787234,
1070
+ "r":0.1731843575,
1071
+ "f":0.2271062271
1072
  },
1073
  "advcl:abs":{
1074
+ "p":0.8535911602,
1075
+ "r":0.860724234,
1076
+ "f":0.8571428571
1077
  },
1078
  "aux:pass":{
1079
+ "p":0.94,
1080
  "r":0.9575551783,
1081
+ "f":0.9486963835
1082
+ },
1083
+ "fixed":{
1084
+ "p":0.9532019704,
1085
+ "r":0.8958333333,
1086
+ "f":0.923627685
1087
+ },
1088
+ "orphan":{
1089
+ "p":0.5299145299,
1090
+ "r":0.3827160494,
1091
+ "f":0.4444444444
1092
  },
1093
  "dep":{
1094
  "p":0.0,
1095
  "r":0.0,
1096
  "f":0.0
1097
  },
 
 
 
 
 
 
 
 
 
 
1098
  "aux":{
1099
+ "p":0.8827586207,
1100
+ "r":0.9208633094,
1101
+ "f":0.9014084507
1102
  },
1103
  "appos":{
1104
+ "p":0.9301242236,
1105
+ "r":0.8946975355,
1106
+ "f":0.9120669966
 
 
 
 
 
1107
  },
1108
  "parataxis":{
1109
+ "p":0.5,
1110
+ "r":0.3541666667,
1111
+ "f":0.4146341463
1112
+ },
1113
+ "dislocated:obj":{
1114
+ "p":0.7676767677,
1115
+ "r":0.7835051546,
1116
+ "f":0.7755102041
1117
  },
1118
  "flat":{
1119
+ "p":0.8504273504,
1120
+ "r":0.7713178295,
1121
+ "f":0.8089430894
1122
  },
1123
  "vocative":{
1124
+ "p":0.625,
1125
  "r":0.5737704918,
1126
+ "f":0.5982905983
 
 
 
 
 
1127
  },
1128
+ "ccomp:reported":{
1129
+ "p":0.2857142857,
1130
+ "r":0.4347826087,
1131
+ "f":0.3448275862
1132
  },
1133
  "reparandum":{
1134
+ "p":0.3333333333,
1135
  "r":0.0833333333,
1136
+ "f":0.1333333333
1137
  },
1138
  "dislocated:nsubj":{
1139
  "p":0.0,
 
1145
  "r":0.0,
1146
  "f":0.0
1147
  },
1148
+ "dislocated":{
1149
+ "p":0.4444444444,
1150
+ "r":0.1290322581,
1151
+ "f":0.2
1152
+ },
1153
  "obl:agent":{
1154
+ "p":0.52,
1155
+ "r":0.2574257426,
1156
+ "f":0.3443708609
1157
  },
1158
  "flat:name":{
1159
+ "p":0.6842105263,
1160
  "r":0.8965517241,
1161
+ "f":0.776119403
 
 
 
 
 
1162
  },
1163
  "obl:tmod":{
1164
+ "p":0.25,
1165
  "r":0.125,
1166
+ "f":0.1666666667
1167
  },
1168
+ "flat:foreign":{
1169
  "p":0.0,
1170
  "r":0.0,
1171
  "f":0.0
1172
  },
1173
+ "obl:lmod":{
1174
+ "p":0.0,
1175
+ "r":0.0,
1176
+ "f":0.0
1177
  },
1178
  "parataxis:reporting":{
1179
  "p":0.0,
 
1211
  "f":0.0
1212
  }
1213
  },
1214
+ "senter_loss":149.3224464637,
1215
+ "tagger_loss":103.7340285726,
1216
+ "morphologizer_loss":1120.5138759464,
1217
+ "trainable_lemmatizer_loss":1068.4113499675,
1218
+ "parser_loss":19163.8495826822
 
 
1219
  },
1220
  "sources":[
1221
+ "UD_Latin-Perseus (via Gamba/Zeman 2023)",
1222
+ "UD_Latin-PROIEL (via Gamba/Zeman 2023)",
1223
+ "UD_Latin-ITTB (via Gamba/Zeman 2023)",
1224
+ "UD_Latin-LLCT (via Gamba/Zeman 2023",
1225
+ "UD_Latin-UDante (via Gamba/Zeman 2023)",
1226
+ "CIRCSE/LASLA: LASLA Corpus",
1227
+ "UD_Latin-CIRCSE"
1228
  ],
1229
  "requirements":[
1230
  "spacy_lookups_data @ git+https://github.com/diyclassics/spacy-lookups-data.git#egg=spacy-lookups-data",
morphologizer/model CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:30435d1da8a5f898bd19f5bc6703b9285cc95c49162bc4ab96caac71eed21f23
3
  size 675109300
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4314e207681175f24999911fb6021e49163510932e816af12cb1ce5db54901d4
3
  size 675109300
ner/model CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:85126f3a3556e73e3599c6df524d948dcad7ba4b862c149964e4f67d26a22bf0
3
  size 673159757
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:418d4628460fe4adac04843c2da156ebbe59eee6624b18e20c32ebd9231704c8
3
  size 673159757
ner/moves CHANGED
@@ -1 +1 @@
1
- ��moves��{"0":{},"1":{"PERSON":16648,"LOC":2821,"NORP":113},"2":{"PERSON":16648,"LOC":2821,"NORP":113},"3":{"PERSON":16648,"LOC":2821,"NORP":113},"4":{"PERSON":16648,"LOC":2821,"NORP":113,"":1},"5":{"":1}}�cfg��neg_key�
 
1
+ ��moves��{"0":{},"1":{"PERSON":16680,"LOC":2845,"NORP":119},"2":{"PERSON":16680,"LOC":2845,"NORP":119},"3":{"PERSON":16680,"LOC":2845,"NORP":119},"4":{"PERSON":16680,"LOC":2845,"NORP":119,"":1},"5":{"":1}}�cfg��neg_key�
parser/model CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:cc25a9aec0dc6774ce3f44d6ed97df18c23d081f4211faad663c01a882488166
3
  size 676235346
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:603d1ed7d520cc49dd6dbda77f5734ebdaa1af716137d8b4413e2dd0a9f38ad6
3
  size 676235346
senter/cfg ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ {
2
+ "overwrite":false
3
+ }
senter/model ADDED
Binary file (255 kB). View file
 
tagger/model CHANGED
Binary files a/tagger/model and b/tagger/model differ
 
trainable_lemmatizer/cfg CHANGED
The diff for this file is too large to render. See raw diff
 
trainable_lemmatizer/model CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:1b0fa3d6a2f6e6a561668232826877c1fd38cd426400d2efae30530fbbe1498d
3
- size 32298653
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:af6dfb059ad18026fdc3e92c427c082c551aef965740e70b9988b18722e750f3
3
+ size 14131797
trainable_lemmatizer/trees CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:866d37b69b5676497496a4cd63e59bb563f2aa2c4a32ba6d2316e5af3d8bc0ac
3
- size 1895920
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a15aefb70d9da8a30f7072f6609d8523a253cf54394eade11c4868ee7a09010b
3
+ size 949814
transformer/model CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:3f494868ee0533c24cf35ea0819d47f8be396592621c75279b08b140f320648f
3
  size 672940068
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b631c9775406d20af454f70054c364541d2ea0f5dccd1cc390296008d1ea6c0f
3
  size 672940068
vocab/strings.json CHANGED
The diff for this file is too large to render. See raw diff