osanseviero commited on
Commit
1868f26
β€’
1 Parent(s): 4524290

Update spaCy pipeline

Browse files
.gitattributes CHANGED
@@ -14,3 +14,7 @@
14
  *.pb filter=lfs diff=lfs merge=lfs -text
15
  *.pt filter=lfs diff=lfs merge=lfs -text
16
  *.pth filter=lfs diff=lfs merge=lfs -text
 
 
 
 
 
14
  *.pb filter=lfs diff=lfs merge=lfs -text
15
  *.pt filter=lfs diff=lfs merge=lfs -text
16
  *.pth filter=lfs diff=lfs merge=lfs -text
17
+ *.whl filter=lfs diff=lfs merge=lfs -text
18
+ *.npz filter=lfs diff=lfs merge=lfs -text
19
+ *strings.json filter=lfs diff=lfs merge=lfs -text
20
+ vectors filter=lfs diff=lfs merge=lfs -text
LICENSE ADDED
@@ -0,0 +1,19 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Copyright 2021 ExplosionAI GmbH
2
+
3
+ Permission is hereby granted, free of charge, to any person obtaining a copy of
4
+ this software and associated documentation files (the "Software"), to deal in
5
+ the Software without restriction, including without limitation the rights to
6
+ use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
7
+ of the Software, and to permit persons to whom the Software is furnished to do
8
+ so, subject to the following conditions:
9
+
10
+ The above copyright notice and this permission notice shall be included in all
11
+ copies or substantial portions of the Software.
12
+
13
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
14
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
15
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
16
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
17
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
18
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
19
+ SOFTWARE.
LICENSES_SOURCES ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # OntoNotes 5
2
+
3
+ * Author: Ralph Weischedel, Martha Palmer, Mitchell Marcus, Eduard Hovy, Sameer Pradhan, Lance Ramshaw, Nianwen Xue, Ann Taylor, Jeff Kaufman, Michelle Franchini, Mohammed El-Bachouti, Robert Belvin, Ann Houston
4
+ * URL: https://catalog.ldc.upenn.edu/LDC2013T19
5
+ * License: commercial (licensed by Explosion)
6
+
7
+ ```
8
+ ```
9
+
10
+
11
+
12
+
13
+ # CoreNLP Universal Dependencies Converter
14
+
15
+ * Author: Stanford NLP Group
16
+ * URL: https://nlp.stanford.edu/software/stanford-dependencies.html
17
+ * License: Citation provided for reference, no code packaged with model
18
+
19
+ ```
20
+ ```
21
+
22
+
23
+
24
+
README.md ADDED
@@ -0,0 +1,103 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - spacy
4
+ - token-classification
5
+ language:
6
+ - zh
7
+ license: MIT
8
+ model-index:
9
+ - name: zh_core_web_sm
10
+ results:
11
+ - tasks:
12
+ name: NER
13
+ type: token-classification
14
+ metrics:
15
+ - name: Precision
16
+ type: precision
17
+ value: 0.7224990884
18
+ - name: Recall
19
+ type: recall
20
+ value: 0.6531868132
21
+ - name: F Score
22
+ type: f_score
23
+ value: 0.6860968431
24
+ - tasks:
25
+ name: POS
26
+ type: token-classification
27
+ metrics:
28
+ - name: Accuracy
29
+ type: accuracy
30
+ value: 0.8957464158
31
+ - tasks:
32
+ name: SENTER
33
+ type: token-classification
34
+ metrics:
35
+ - name: Precision
36
+ type: precision
37
+ value: 0.7817728729
38
+ - name: Recall
39
+ type: recall
40
+ value: 0.7311469952
41
+ - name: F Score
42
+ type: f_score
43
+ value: 0.7556129032
44
+ - tasks:
45
+ name: UNLABELED_DEPENDENCIES
46
+ type: token-classification
47
+ metrics:
48
+ - name: Accuracy
49
+ type: accuracy
50
+ value: 0.6965379684
51
+ - tasks:
52
+ name: LABELED_DEPENDENCIES
53
+ type: token-classification
54
+ metrics:
55
+ - name: Accuracy
56
+ type: accuracy
57
+ value: 0.6965379684
58
+ ---
59
+ ### Details: https://spacy.io/models/zh#zh_core_web_sm
60
+
61
+ Chinese pipeline optimized for CPU. Components: tok2vec, tagger, parser, senter, ner, attribute_ruler.
62
+
63
+ | Feature | Description |
64
+ | --- | --- |
65
+ | **Name** | `zh_core_web_sm` |
66
+ | **Version** | `3.1.0` |
67
+ | **spaCy** | `>=3.1.0,<3.2.0` |
68
+ | **Default Pipeline** | `tok2vec`, `tagger`, `parser`, `attribute_ruler`, `ner` |
69
+ | **Components** | `tok2vec`, `tagger`, `parser`, `senter`, `attribute_ruler`, `ner` |
70
+ | **Vectors** | 0 keys, 0 unique vectors (0 dimensions) |
71
+ | **Sources** | [OntoNotes 5](https://catalog.ldc.upenn.edu/LDC2013T19) (Ralph Weischedel, Martha Palmer, Mitchell Marcus, Eduard Hovy, Sameer Pradhan, Lance Ramshaw, Nianwen Xue, Ann Taylor, Jeff Kaufman, Michelle Franchini, Mohammed El-Bachouti, Robert Belvin, Ann Houston)<br />[CoreNLP Universal Dependencies Converter](https://nlp.stanford.edu/software/stanford-dependencies.html) (Stanford NLP Group) |
72
+ | **License** | `MIT` |
73
+ | **Author** | [Explosion](https://explosion.ai) |
74
+
75
+ ### Label Scheme
76
+
77
+ <details>
78
+
79
+ <summary>View label scheme (101 labels for 4 components)</summary>
80
+
81
+ | Component | Labels |
82
+ | --- | --- |
83
+ | **`tagger`** | `AD`, `AS`, `BA`, `CC`, `CD`, `CS`, `DEC`, `DEG`, `DER`, `DEV`, `DT`, `ETC`, `FW`, `IJ`, `INF`, `JJ`, `LB`, `LC`, `M`, `MSP`, `NN`, `NR`, `NT`, `OD`, `ON`, `P`, `PN`, `PU`, `SB`, `SP`, `URL`, `VA`, `VC`, `VE`, `VV`, `X` |
84
+ | **`parser`** | `ROOT`, `acl`, `advcl:loc`, `advmod`, `advmod:dvp`, `advmod:loc`, `advmod:rcomp`, `amod`, `amod:ordmod`, `appos`, `aux:asp`, `aux:ba`, `aux:modal`, `aux:prtmod`, `auxpass`, `case`, `cc`, `ccomp`, `compound:nn`, `compound:vc`, `conj`, `cop`, `dep`, `det`, `discourse`, `dobj`, `etc`, `mark`, `mark:clf`, `name`, `neg`, `nmod`, `nmod:assmod`, `nmod:poss`, `nmod:prep`, `nmod:range`, `nmod:tmod`, `nmod:topic`, `nsubj`, `nsubj:xsubj`, `nsubjpass`, `nummod`, `parataxis:prnmod`, `punct`, `xcomp` |
85
+ | **`senter`** | `I`, `S` |
86
+ | **`ner`** | `CARDINAL`, `DATE`, `EVENT`, `FAC`, `GPE`, `LANGUAGE`, `LAW`, `LOC`, `MONEY`, `NORP`, `ORDINAL`, `ORG`, `PERCENT`, `PERSON`, `PRODUCT`, `QUANTITY`, `TIME`, `WORK_OF_ART` |
87
+
88
+ </details>
89
+
90
+ ### Accuracy
91
+
92
+ | Type | Score |
93
+ | --- | --- |
94
+ | `TOKEN_ACC` | 97.88 |
95
+ | `TAG_ACC` | 89.57 |
96
+ | `DEP_UAS` | 69.65 |
97
+ | `DEP_LAS` | 64.26 |
98
+ | `ENTS_P` | 72.25 |
99
+ | `ENTS_R` | 65.32 |
100
+ | `ENTS_F` | 68.61 |
101
+ | `SENTS_P` | 78.18 |
102
+ | `SENTS_R` | 73.11 |
103
+ | `SENTS_F` | 75.56 |
accuracy.json ADDED
@@ -0,0 +1,332 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "token_acc": 0.9788303388,
3
+ "tag_acc": 0.8957464158,
4
+ "dep_uas": 0.6965379684,
5
+ "dep_las": 0.6426392548,
6
+ "ents_p": 0.7224990884,
7
+ "ents_r": 0.6531868132,
8
+ "ents_f": 0.6860968431,
9
+ "sents_p": 0.7817728729,
10
+ "sents_r": 0.7311469952,
11
+ "sents_f": 0.7556129032,
12
+ "speed": 10175.5709293766,
13
+ "dep_las_per_type": {
14
+ "dep": {
15
+ "p": 0.4702473498,
16
+ "r": 0.3361624735,
17
+ "f": 0.3920575065
18
+ },
19
+ "case": {
20
+ "p": 0.8028549383,
21
+ "r": 0.7569107662,
22
+ "f": 0.7792061907
23
+ },
24
+ "nmod:tmod": {
25
+ "p": 0.7231788079,
26
+ "r": 0.7428571429,
27
+ "f": 0.732885906
28
+ },
29
+ "nummod": {
30
+ "p": 0.8233471074,
31
+ "r": 0.5309793471,
32
+ "f": 0.6456055083
33
+ },
34
+ "mark:clf": {
35
+ "p": 0.9301898347,
36
+ "r": 0.5665796345,
37
+ "f": 0.7042188224
38
+ },
39
+ "auxpass": {
40
+ "p": 0.8756756757,
41
+ "r": 0.8756756757,
42
+ "f": 0.8756756757
43
+ },
44
+ "nsubj": {
45
+ "p": 0.771189813,
46
+ "r": 0.7141628793,
47
+ "f": 0.7415816327
48
+ },
49
+ "acl": {
50
+ "p": 0.6791758646,
51
+ "r": 0.5119245702,
52
+ "f": 0.5838077166
53
+ },
54
+ "advmod": {
55
+ "p": 0.8065869786,
56
+ "r": 0.7189979596,
57
+ "f": 0.7602780774
58
+ },
59
+ "mark": {
60
+ "p": 0.7065868263,
61
+ "r": 0.6722173532,
62
+ "f": 0.6889737256
63
+ },
64
+ "xcomp": {
65
+ "p": 0.7559198543,
66
+ "r": 0.6758957655,
67
+ "f": 0.7136715391
68
+ },
69
+ "nmod:assmod": {
70
+ "p": 0.7642786398,
71
+ "r": 0.7205104264,
72
+ "f": 0.7417494393
73
+ },
74
+ "det": {
75
+ "p": 0.8394160584,
76
+ "r": 0.6063268893,
77
+ "f": 0.7040816327
78
+ },
79
+ "amod": {
80
+ "p": 0.7544338336,
81
+ "r": 0.6516103692,
82
+ "f": 0.6992623815
83
+ },
84
+ "nmod:prep": {
85
+ "p": 0.7013125222,
86
+ "r": 0.5980036298,
87
+ "f": 0.6455510204
88
+ },
89
+ "root": {
90
+ "p": 0.7283996995,
91
+ "r": 0.6455801565,
92
+ "f": 0.6844938664
93
+ },
94
+ "aux:prtmod": {
95
+ "p": 0.890625,
96
+ "r": 0.8142857143,
97
+ "f": 0.8507462687
98
+ },
99
+ "compound:nn": {
100
+ "p": 0.7243023667,
101
+ "r": 0.6939086294,
102
+ "f": 0.7087798133
103
+ },
104
+ "dobj": {
105
+ "p": 0.780507386,
106
+ "r": 0.7200414753,
107
+ "f": 0.7490561677
108
+ },
109
+ "ccomp": {
110
+ "p": 0.6268199234,
111
+ "r": 0.6360808709,
112
+ "f": 0.6314164415
113
+ },
114
+ "advmod:rcomp": {
115
+ "p": 0.8096774194,
116
+ "r": 0.6952908587,
117
+ "f": 0.7481371088
118
+ },
119
+ "nmod:topic": {
120
+ "p": 0.3686868687,
121
+ "r": 0.237012987,
122
+ "f": 0.2885375494
123
+ },
124
+ "cop": {
125
+ "p": 0.7385620915,
126
+ "r": 0.5817245817,
127
+ "f": 0.6508279338
128
+ },
129
+ "discourse": {
130
+ "p": 0.5540037244,
131
+ "r": 0.4909240924,
132
+ "f": 0.52055993
133
+ },
134
+ "neg": {
135
+ "p": 0.823880597,
136
+ "r": 0.6563614744,
137
+ "f": 0.730641959
138
+ },
139
+ "aux:modal": {
140
+ "p": 0.8563772776,
141
+ "r": 0.8262668046,
142
+ "f": 0.8410526316
143
+ },
144
+ "nmod": {
145
+ "p": 0.7135761589,
146
+ "r": 0.5848032564,
147
+ "f": 0.6428038777
148
+ },
149
+ "aux:ba": {
150
+ "p": 0.8087431694,
151
+ "r": 0.7872340426,
152
+ "f": 0.7978436658
153
+ },
154
+ "advmod:loc": {
155
+ "p": 0.58203125,
156
+ "r": 0.4421364985,
157
+ "f": 0.502529511
158
+ },
159
+ "aux:asp": {
160
+ "p": 0.9053941909,
161
+ "r": 0.870015949,
162
+ "f": 0.8873525824
163
+ },
164
+ "conj": {
165
+ "p": 0.4784786642,
166
+ "r": 0.4875236295,
167
+ "f": 0.4829588015
168
+ },
169
+ "nsubjpass": {
170
+ "p": 0.8292682927,
171
+ "r": 0.68,
172
+ "f": 0.7472527473
173
+ },
174
+ "compound:vc": {
175
+ "p": 0.3876404494,
176
+ "r": 0.3575129534,
177
+ "f": 0.371967655
178
+ },
179
+ "advcl:loc": {
180
+ "p": 0.5304347826,
181
+ "r": 0.4357142857,
182
+ "f": 0.4784313725
183
+ },
184
+ "cc": {
185
+ "p": 0.6937618147,
186
+ "r": 0.6512866016,
187
+ "f": 0.6718535469
188
+ },
189
+ "advmod:dvp": {
190
+ "p": 0.8114754098,
191
+ "r": 0.6149068323,
192
+ "f": 0.6996466431
193
+ },
194
+ "appos": {
195
+ "p": 0.8778054863,
196
+ "r": 0.8091954023,
197
+ "f": 0.8421052632
198
+ },
199
+ "nmod:range": {
200
+ "p": 0.6897810219,
201
+ "r": 0.6342281879,
202
+ "f": 0.6608391608
203
+ },
204
+ "nmod:poss": {
205
+ "p": 0.6989247312,
206
+ "r": 0.4814814815,
207
+ "f": 0.5701754386
208
+ },
209
+ "name": {
210
+ "p": 0.6391752577,
211
+ "r": 0.4592592593,
212
+ "f": 0.5344827586
213
+ },
214
+ "nsubj:xsubj": {
215
+ "p": 0.0,
216
+ "r": 0.0,
217
+ "f": 0.0
218
+ },
219
+ "parataxis:prnmod": {
220
+ "p": 0.4516129032,
221
+ "r": 0.1052631579,
222
+ "f": 0.1707317073
223
+ },
224
+ "amod:ordmod": {
225
+ "p": 0.6274509804,
226
+ "r": 0.5,
227
+ "f": 0.5565217391
228
+ },
229
+ "erased": {
230
+ "p": 0.0,
231
+ "r": 0.0,
232
+ "f": 0.0
233
+ },
234
+ "etc": {
235
+ "p": 0.8837209302,
236
+ "r": 0.9047619048,
237
+ "f": 0.8941176471
238
+ }
239
+ },
240
+ "ents_per_type": {
241
+ "DATE": {
242
+ "p": 0.75,
243
+ "r": 0.7849355798,
244
+ "f": 0.7670702179
245
+ },
246
+ "GPE": {
247
+ "p": 0.7579383341,
248
+ "r": 0.8049853372,
249
+ "f": 0.7807537331
250
+ },
251
+ "ORDINAL": {
252
+ "p": 0.8603351955,
253
+ "r": 0.8105263158,
254
+ "f": 0.8346883469
255
+ },
256
+ "FAC": {
257
+ "p": 0.4482758621,
258
+ "r": 0.2795698925,
259
+ "f": 0.3443708609
260
+ },
261
+ "ORG": {
262
+ "p": 0.6875,
263
+ "r": 0.602739726,
264
+ "f": 0.6423357664
265
+ },
266
+ "QUANTITY": {
267
+ "p": 0.7777777778,
268
+ "r": 0.6222222222,
269
+ "f": 0.6913580247
270
+ },
271
+ "PERSON": {
272
+ "p": 0.8103932584,
273
+ "r": 0.743556701,
274
+ "f": 0.7755376344
275
+ },
276
+ "CARDINAL": {
277
+ "p": 0.5814220183,
278
+ "r": 0.5110887097,
279
+ "f": 0.5439914163
280
+ },
281
+ "LOC": {
282
+ "p": 0.5319148936,
283
+ "r": 0.3360215054,
284
+ "f": 0.4118616145
285
+ },
286
+ "NORP": {
287
+ "p": 0.6774193548,
288
+ "r": 0.4411764706,
289
+ "f": 0.534351145
290
+ },
291
+ "WORK_OF_ART": {
292
+ "p": 0.4520547945,
293
+ "r": 0.22,
294
+ "f": 0.2959641256
295
+ },
296
+ "TIME": {
297
+ "p": 0.7438423645,
298
+ "r": 0.7330097087,
299
+ "f": 0.7383863081
300
+ },
301
+ "MONEY": {
302
+ "p": 0.9292035398,
303
+ "r": 0.7777777778,
304
+ "f": 0.8467741935
305
+ },
306
+ "PERCENT": {
307
+ "p": 0.8395061728,
308
+ "r": 0.8192771084,
309
+ "f": 0.8292682927
310
+ },
311
+ "EVENT": {
312
+ "p": 0.6170212766,
313
+ "r": 0.4264705882,
314
+ "f": 0.5043478261
315
+ },
316
+ "PRODUCT": {
317
+ "p": 0.0,
318
+ "r": 0.0,
319
+ "f": 0.0
320
+ },
321
+ "LAW": {
322
+ "p": 0.3043478261,
323
+ "r": 0.1166666667,
324
+ "f": 0.1686746988
325
+ },
326
+ "LANGUAGE": {
327
+ "p": 0.5,
328
+ "r": 0.5555555556,
329
+ "f": 0.5263157895
330
+ }
331
+ }
332
+ }
attribute_ruler/patterns ADDED
Binary file (1.93 kB). View file
 
config.cfg ADDED
@@ -0,0 +1,255 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [paths]
2
+ train = "corpus/zh-core-news/train.spacy"
3
+ dev = "corpus/zh-core-news/dev.spacy"
4
+ vectors = null
5
+ raw = null
6
+ init_tok2vec = null
7
+ vocab_data = null
8
+
9
+ [system]
10
+ gpu_allocator = null
11
+ seed = 0
12
+
13
+ [nlp]
14
+ lang = "zh"
15
+ pipeline = ["tok2vec","tagger","parser","senter","attribute_ruler","ner"]
16
+ disabled = ["senter"]
17
+ before_creation = null
18
+ after_creation = null
19
+ after_pipeline_creation = null
20
+ batch_size = 256
21
+
22
+ [nlp.tokenizer]
23
+ @tokenizers = "spacy.zh.ChineseTokenizer"
24
+ segmenter = "pkuseg"
25
+
26
+ [components]
27
+
28
+ [components.attribute_ruler]
29
+ factory = "attribute_ruler"
30
+ validate = false
31
+
32
+ [components.ner]
33
+ factory = "ner"
34
+ incorrect_spans_key = null
35
+ moves = null
36
+ update_with_oracle_cut_size = 100
37
+
38
+ [components.ner.model]
39
+ @architectures = "spacy.TransitionBasedParser.v2"
40
+ state_type = "ner"
41
+ extra_state_tokens = false
42
+ hidden_width = 64
43
+ maxout_pieces = 2
44
+ use_upper = true
45
+ nO = null
46
+
47
+ [components.ner.model.tok2vec]
48
+ @architectures = "spacy.Tok2Vec.v2"
49
+
50
+ [components.ner.model.tok2vec.embed]
51
+ @architectures = "spacy.MultiHashEmbed.v2"
52
+ width = 96
53
+ attrs = ["NORM","PREFIX","SUFFIX","SHAPE"]
54
+ rows = [5000,2500,2500,2500]
55
+ include_static_vectors = false
56
+
57
+ [components.ner.model.tok2vec.encode]
58
+ @architectures = "spacy.MaxoutWindowEncoder.v2"
59
+ width = 96
60
+ depth = 4
61
+ window_size = 1
62
+ maxout_pieces = 3
63
+
64
+ [components.parser]
65
+ factory = "parser"
66
+ learn_tokens = false
67
+ min_action_freq = 30
68
+ moves = null
69
+ update_with_oracle_cut_size = 100
70
+
71
+ [components.parser.model]
72
+ @architectures = "spacy.TransitionBasedParser.v2"
73
+ state_type = "parser"
74
+ extra_state_tokens = false
75
+ hidden_width = 64
76
+ maxout_pieces = 2
77
+ use_upper = true
78
+ nO = null
79
+
80
+ [components.parser.model.tok2vec]
81
+ @architectures = "spacy.Tok2VecListener.v1"
82
+ width = ${components.tok2vec.model.encode:width}
83
+ upstream = "tok2vec"
84
+
85
+ [components.senter]
86
+ factory = "senter"
87
+
88
+ [components.senter.model]
89
+ @architectures = "spacy.Tagger.v1"
90
+ nO = null
91
+
92
+ [components.senter.model.tok2vec]
93
+ @architectures = "spacy.Tok2Vec.v2"
94
+
95
+ [components.senter.model.tok2vec.embed]
96
+ @architectures = "spacy.MultiHashEmbed.v2"
97
+ width = 16
98
+ attrs = ["NORM","PREFIX","SUFFIX","SHAPE"]
99
+ rows = [1000,500,500,500]
100
+ include_static_vectors = false
101
+
102
+ [components.senter.model.tok2vec.encode]
103
+ @architectures = "spacy.MaxoutWindowEncoder.v2"
104
+ width = 16
105
+ depth = 2
106
+ window_size = 1
107
+ maxout_pieces = 2
108
+
109
+ [components.tagger]
110
+ factory = "tagger"
111
+
112
+ [components.tagger.model]
113
+ @architectures = "spacy.Tagger.v1"
114
+ nO = null
115
+
116
+ [components.tagger.model.tok2vec]
117
+ @architectures = "spacy.Tok2VecListener.v1"
118
+ width = ${components.tok2vec.model.encode:width}
119
+ upstream = "tok2vec"
120
+
121
+ [components.tok2vec]
122
+ factory = "tok2vec"
123
+
124
+ [components.tok2vec.model]
125
+ @architectures = "spacy.Tok2Vec.v2"
126
+
127
+ [components.tok2vec.model.embed]
128
+ @architectures = "spacy.MultiHashEmbed.v2"
129
+ width = ${components.tok2vec.model.encode:width}
130
+ attrs = ["NORM","PREFIX","SUFFIX","SHAPE"]
131
+ rows = [5000,2500,2500,2500]
132
+ include_static_vectors = false
133
+
134
+ [components.tok2vec.model.encode]
135
+ @architectures = "spacy.MaxoutWindowEncoder.v2"
136
+ width = 96
137
+ depth = 4
138
+ window_size = 1
139
+ maxout_pieces = 3
140
+
141
+ [corpora]
142
+
143
+ [corpora.dev]
144
+ @readers = "spacy.Corpus.v1"
145
+ limit = 0
146
+ max_length = 0
147
+ path = ${paths:dev}
148
+ gold_preproc = false
149
+ augmenter = null
150
+
151
+ [corpora.train]
152
+ @readers = "spacy.Corpus.v1"
153
+ path = ${paths:train}
154
+ max_length = 5000
155
+ gold_preproc = false
156
+ limit = 0
157
+ augmenter = null
158
+
159
+ [training]
160
+ train_corpus = "corpora.train"
161
+ dev_corpus = "corpora.dev"
162
+ seed = ${system:seed}
163
+ gpu_allocator = ${system:gpu_allocator}
164
+ dropout = 0.1
165
+ accumulate_gradient = 1
166
+ patience = 5000
167
+ max_epochs = 0
168
+ max_steps = 0
169
+ eval_frequency = 1000
170
+ frozen_components = []
171
+ before_to_disk = null
172
+ annotating_components = []
173
+
174
+ [training.batcher]
175
+ @batchers = "spacy.batch_by_words.v1"
176
+ discard_oversize = false
177
+ tolerance = 0.2
178
+ get_length = null
179
+
180
+ [training.batcher.size]
181
+ @schedules = "compounding.v1"
182
+ start = 100
183
+ stop = 1000
184
+ compound = 1.001
185
+ t = 0.0
186
+
187
+ [training.logger]
188
+ @loggers = "spacy.WandbLogger.v1"
189
+ project_name = "spacy-v3.0.0a2"
190
+ remove_config_values = []
191
+
192
+ [training.optimizer]
193
+ @optimizers = "Adam.v1"
194
+ beta1 = 0.9
195
+ beta2 = 0.999
196
+ L2_is_weight_decay = true
197
+ L2 = 0.01
198
+ grad_clip = 1.0
199
+ use_averages = true
200
+ eps = 0.00000001
201
+ learn_rate = 0.001
202
+
203
+ [training.score_weights]
204
+ tag_acc = 0.24
205
+ dep_uas = 0.0
206
+ dep_las = 0.24
207
+ dep_las_per_type = null
208
+ sents_p = null
209
+ sents_r = null
210
+ sents_f = 0.03
211
+ ents_f = 0.5
212
+ ents_p = 0.0
213
+ ents_r = 0.0
214
+ ents_per_type = null
215
+
216
+ [pretraining]
217
+
218
+ [initialize]
219
+ vocab_data = ${paths.vocab_data}
220
+ vectors = ${paths.vectors}
221
+ init_tok2vec = ${paths.init_tok2vec}
222
+ before_init = null
223
+ after_init = null
224
+
225
+ [initialize.components]
226
+
227
+ [initialize.components.ner]
228
+
229
+ [initialize.components.ner.labels]
230
+ @readers = "spacy.read_labels.v1"
231
+ path = "corpus/labels/ner.json"
232
+ require = false
233
+
234
+ [initialize.components.parser]
235
+
236
+ [initialize.components.parser.labels]
237
+ @readers = "spacy.read_labels.v1"
238
+ path = "corpus/labels/parser.json"
239
+ require = false
240
+
241
+ [initialize.components.tagger]
242
+
243
+ [initialize.components.tagger.labels]
244
+ @readers = "spacy.read_labels.v1"
245
+ path = "corpus/labels/tagger.json"
246
+ require = false
247
+
248
+ [initialize.lookups]
249
+ @misc = "spacy.LookupsDataLoader.v1"
250
+ lang = ${nlp.lang}
251
+ tables = []
252
+
253
+ [initialize.tokenizer]
254
+ pkuseg_model = "assets/pkuseg_model"
255
+ pkuseg_user_dict = "default"
meta.json ADDED
@@ -0,0 +1,502 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "lang":"zh",
3
+ "name":"core_web_sm",
4
+ "version":"3.1.0",
5
+ "description":"Chinese pipeline optimized for CPU. Components: tok2vec, tagger, parser, senter, ner, attribute_ruler.",
6
+ "author":"Explosion",
7
+ "email":"[email protected]",
8
+ "url":"https://explosion.ai",
9
+ "license":"MIT",
10
+ "spacy_version":">=3.1.0,<3.2.0",
11
+ "spacy_git_version":"caba63b74",
12
+ "vectors":{
13
+ "width":0,
14
+ "vectors":0,
15
+ "keys":0,
16
+ "name":null
17
+ },
18
+ "labels":{
19
+ "tok2vec":[
20
+
21
+ ],
22
+ "tagger":[
23
+ "AD",
24
+ "AS",
25
+ "BA",
26
+ "CC",
27
+ "CD",
28
+ "CS",
29
+ "DEC",
30
+ "DEG",
31
+ "DER",
32
+ "DEV",
33
+ "DT",
34
+ "ETC",
35
+ "FW",
36
+ "IJ",
37
+ "INF",
38
+ "JJ",
39
+ "LB",
40
+ "LC",
41
+ "M",
42
+ "MSP",
43
+ "NN",
44
+ "NR",
45
+ "NT",
46
+ "OD",
47
+ "ON",
48
+ "P",
49
+ "PN",
50
+ "PU",
51
+ "SB",
52
+ "SP",
53
+ "URL",
54
+ "VA",
55
+ "VC",
56
+ "VE",
57
+ "VV",
58
+ "X"
59
+ ],
60
+ "parser":[
61
+ "ROOT",
62
+ "acl",
63
+ "advcl:loc",
64
+ "advmod",
65
+ "advmod:dvp",
66
+ "advmod:loc",
67
+ "advmod:rcomp",
68
+ "amod",
69
+ "amod:ordmod",
70
+ "appos",
71
+ "aux:asp",
72
+ "aux:ba",
73
+ "aux:modal",
74
+ "aux:prtmod",
75
+ "auxpass",
76
+ "case",
77
+ "cc",
78
+ "ccomp",
79
+ "compound:nn",
80
+ "compound:vc",
81
+ "conj",
82
+ "cop",
83
+ "dep",
84
+ "det",
85
+ "discourse",
86
+ "dobj",
87
+ "etc",
88
+ "mark",
89
+ "mark:clf",
90
+ "name",
91
+ "neg",
92
+ "nmod",
93
+ "nmod:assmod",
94
+ "nmod:poss",
95
+ "nmod:prep",
96
+ "nmod:range",
97
+ "nmod:tmod",
98
+ "nmod:topic",
99
+ "nsubj",
100
+ "nsubj:xsubj",
101
+ "nsubjpass",
102
+ "nummod",
103
+ "parataxis:prnmod",
104
+ "punct",
105
+ "xcomp"
106
+ ],
107
+ "senter":[
108
+ "I",
109
+ "S"
110
+ ],
111
+ "attribute_ruler":[
112
+
113
+ ],
114
+ "ner":[
115
+ "CARDINAL",
116
+ "DATE",
117
+ "EVENT",
118
+ "FAC",
119
+ "GPE",
120
+ "LANGUAGE",
121
+ "LAW",
122
+ "LOC",
123
+ "MONEY",
124
+ "NORP",
125
+ "ORDINAL",
126
+ "ORG",
127
+ "PERCENT",
128
+ "PERSON",
129
+ "PRODUCT",
130
+ "QUANTITY",
131
+ "TIME",
132
+ "WORK_OF_ART"
133
+ ]
134
+ },
135
+ "pipeline":[
136
+ "tok2vec",
137
+ "tagger",
138
+ "parser",
139
+ "attribute_ruler",
140
+ "ner"
141
+ ],
142
+ "components":[
143
+ "tok2vec",
144
+ "tagger",
145
+ "parser",
146
+ "senter",
147
+ "attribute_ruler",
148
+ "ner"
149
+ ],
150
+ "disabled":[
151
+ "senter"
152
+ ],
153
+ "performance":{
154
+ "token_acc":0.9788303388,
155
+ "tag_acc":0.8957464158,
156
+ "dep_uas":0.6965379684,
157
+ "dep_las":0.6426392548,
158
+ "ents_p":0.7224990884,
159
+ "ents_r":0.6531868132,
160
+ "ents_f":0.6860968431,
161
+ "sents_p":0.7817728729,
162
+ "sents_r":0.7311469952,
163
+ "sents_f":0.7556129032,
164
+ "speed":10175.5709293766,
165
+ "dep_las_per_type":{
166
+ "dep":{
167
+ "p":0.4702473498,
168
+ "r":0.3361624735,
169
+ "f":0.3920575065
170
+ },
171
+ "case":{
172
+ "p":0.8028549383,
173
+ "r":0.7569107662,
174
+ "f":0.7792061907
175
+ },
176
+ "nmod:tmod":{
177
+ "p":0.7231788079,
178
+ "r":0.7428571429,
179
+ "f":0.732885906
180
+ },
181
+ "nummod":{
182
+ "p":0.8233471074,
183
+ "r":0.5309793471,
184
+ "f":0.6456055083
185
+ },
186
+ "mark:clf":{
187
+ "p":0.9301898347,
188
+ "r":0.5665796345,
189
+ "f":0.7042188224
190
+ },
191
+ "auxpass":{
192
+ "p":0.8756756757,
193
+ "r":0.8756756757,
194
+ "f":0.8756756757
195
+ },
196
+ "nsubj":{
197
+ "p":0.771189813,
198
+ "r":0.7141628793,
199
+ "f":0.7415816327
200
+ },
201
+ "acl":{
202
+ "p":0.6791758646,
203
+ "r":0.5119245702,
204
+ "f":0.5838077166
205
+ },
206
+ "advmod":{
207
+ "p":0.8065869786,
208
+ "r":0.7189979596,
209
+ "f":0.7602780774
210
+ },
211
+ "mark":{
212
+ "p":0.7065868263,
213
+ "r":0.6722173532,
214
+ "f":0.6889737256
215
+ },
216
+ "xcomp":{
217
+ "p":0.7559198543,
218
+ "r":0.6758957655,
219
+ "f":0.7136715391
220
+ },
221
+ "nmod:assmod":{
222
+ "p":0.7642786398,
223
+ "r":0.7205104264,
224
+ "f":0.7417494393
225
+ },
226
+ "det":{
227
+ "p":0.8394160584,
228
+ "r":0.6063268893,
229
+ "f":0.7040816327
230
+ },
231
+ "amod":{
232
+ "p":0.7544338336,
233
+ "r":0.6516103692,
234
+ "f":0.6992623815
235
+ },
236
+ "nmod:prep":{
237
+ "p":0.7013125222,
238
+ "r":0.5980036298,
239
+ "f":0.6455510204
240
+ },
241
+ "root":{
242
+ "p":0.7283996995,
243
+ "r":0.6455801565,
244
+ "f":0.6844938664
245
+ },
246
+ "aux:prtmod":{
247
+ "p":0.890625,
248
+ "r":0.8142857143,
249
+ "f":0.8507462687
250
+ },
251
+ "compound:nn":{
252
+ "p":0.7243023667,
253
+ "r":0.6939086294,
254
+ "f":0.7087798133
255
+ },
256
+ "dobj":{
257
+ "p":0.780507386,
258
+ "r":0.7200414753,
259
+ "f":0.7490561677
260
+ },
261
+ "ccomp":{
262
+ "p":0.6268199234,
263
+ "r":0.6360808709,
264
+ "f":0.6314164415
265
+ },
266
+ "advmod:rcomp":{
267
+ "p":0.8096774194,
268
+ "r":0.6952908587,
269
+ "f":0.7481371088
270
+ },
271
+ "nmod:topic":{
272
+ "p":0.3686868687,
273
+ "r":0.237012987,
274
+ "f":0.2885375494
275
+ },
276
+ "cop":{
277
+ "p":0.7385620915,
278
+ "r":0.5817245817,
279
+ "f":0.6508279338
280
+ },
281
+ "discourse":{
282
+ "p":0.5540037244,
283
+ "r":0.4909240924,
284
+ "f":0.52055993
285
+ },
286
+ "neg":{
287
+ "p":0.823880597,
288
+ "r":0.6563614744,
289
+ "f":0.730641959
290
+ },
291
+ "aux:modal":{
292
+ "p":0.8563772776,
293
+ "r":0.8262668046,
294
+ "f":0.8410526316
295
+ },
296
+ "nmod":{
297
+ "p":0.7135761589,
298
+ "r":0.5848032564,
299
+ "f":0.6428038777
300
+ },
301
+ "aux:ba":{
302
+ "p":0.8087431694,
303
+ "r":0.7872340426,
304
+ "f":0.7978436658
305
+ },
306
+ "advmod:loc":{
307
+ "p":0.58203125,
308
+ "r":0.4421364985,
309
+ "f":0.502529511
310
+ },
311
+ "aux:asp":{
312
+ "p":0.9053941909,
313
+ "r":0.870015949,
314
+ "f":0.8873525824
315
+ },
316
+ "conj":{
317
+ "p":0.4784786642,
318
+ "r":0.4875236295,
319
+ "f":0.4829588015
320
+ },
321
+ "nsubjpass":{
322
+ "p":0.8292682927,
323
+ "r":0.68,
324
+ "f":0.7472527473
325
+ },
326
+ "compound:vc":{
327
+ "p":0.3876404494,
328
+ "r":0.3575129534,
329
+ "f":0.371967655
330
+ },
331
+ "advcl:loc":{
332
+ "p":0.5304347826,
333
+ "r":0.4357142857,
334
+ "f":0.4784313725
335
+ },
336
+ "cc":{
337
+ "p":0.6937618147,
338
+ "r":0.6512866016,
339
+ "f":0.6718535469
340
+ },
341
+ "advmod:dvp":{
342
+ "p":0.8114754098,
343
+ "r":0.6149068323,
344
+ "f":0.6996466431
345
+ },
346
+ "appos":{
347
+ "p":0.8778054863,
348
+ "r":0.8091954023,
349
+ "f":0.8421052632
350
+ },
351
+ "nmod:range":{
352
+ "p":0.6897810219,
353
+ "r":0.6342281879,
354
+ "f":0.6608391608
355
+ },
356
+ "nmod:poss":{
357
+ "p":0.6989247312,
358
+ "r":0.4814814815,
359
+ "f":0.5701754386
360
+ },
361
+ "name":{
362
+ "p":0.6391752577,
363
+ "r":0.4592592593,
364
+ "f":0.5344827586
365
+ },
366
+ "nsubj:xsubj":{
367
+ "p":0.0,
368
+ "r":0.0,
369
+ "f":0.0
370
+ },
371
+ "parataxis:prnmod":{
372
+ "p":0.4516129032,
373
+ "r":0.1052631579,
374
+ "f":0.1707317073
375
+ },
376
+ "amod:ordmod":{
377
+ "p":0.6274509804,
378
+ "r":0.5,
379
+ "f":0.5565217391
380
+ },
381
+ "erased":{
382
+ "p":0.0,
383
+ "r":0.0,
384
+ "f":0.0
385
+ },
386
+ "etc":{
387
+ "p":0.8837209302,
388
+ "r":0.9047619048,
389
+ "f":0.8941176471
390
+ }
391
+ },
392
+ "ents_per_type":{
393
+ "DATE":{
394
+ "p":0.75,
395
+ "r":0.7849355798,
396
+ "f":0.7670702179
397
+ },
398
+ "GPE":{
399
+ "p":0.7579383341,
400
+ "r":0.8049853372,
401
+ "f":0.7807537331
402
+ },
403
+ "ORDINAL":{
404
+ "p":0.8603351955,
405
+ "r":0.8105263158,
406
+ "f":0.8346883469
407
+ },
408
+ "FAC":{
409
+ "p":0.4482758621,
410
+ "r":0.2795698925,
411
+ "f":0.3443708609
412
+ },
413
+ "ORG":{
414
+ "p":0.6875,
415
+ "r":0.602739726,
416
+ "f":0.6423357664
417
+ },
418
+ "QUANTITY":{
419
+ "p":0.7777777778,
420
+ "r":0.6222222222,
421
+ "f":0.6913580247
422
+ },
423
+ "PERSON":{
424
+ "p":0.8103932584,
425
+ "r":0.743556701,
426
+ "f":0.7755376344
427
+ },
428
+ "CARDINAL":{
429
+ "p":0.5814220183,
430
+ "r":0.5110887097,
431
+ "f":0.5439914163
432
+ },
433
+ "LOC":{
434
+ "p":0.5319148936,
435
+ "r":0.3360215054,
436
+ "f":0.4118616145
437
+ },
438
+ "NORP":{
439
+ "p":0.6774193548,
440
+ "r":0.4411764706,
441
+ "f":0.534351145
442
+ },
443
+ "WORK_OF_ART":{
444
+ "p":0.4520547945,
445
+ "r":0.22,
446
+ "f":0.2959641256
447
+ },
448
+ "TIME":{
449
+ "p":0.7438423645,
450
+ "r":0.7330097087,
451
+ "f":0.7383863081
452
+ },
453
+ "MONEY":{
454
+ "p":0.9292035398,
455
+ "r":0.7777777778,
456
+ "f":0.8467741935
457
+ },
458
+ "PERCENT":{
459
+ "p":0.8395061728,
460
+ "r":0.8192771084,
461
+ "f":0.8292682927
462
+ },
463
+ "EVENT":{
464
+ "p":0.6170212766,
465
+ "r":0.4264705882,
466
+ "f":0.5043478261
467
+ },
468
+ "PRODUCT":{
469
+ "p":0.0,
470
+ "r":0.0,
471
+ "f":0.0
472
+ },
473
+ "LAW":{
474
+ "p":0.3043478261,
475
+ "r":0.1166666667,
476
+ "f":0.1686746988
477
+ },
478
+ "LANGUAGE":{
479
+ "p":0.5,
480
+ "r":0.5555555556,
481
+ "f":0.5263157895
482
+ }
483
+ }
484
+ },
485
+ "sources":[
486
+ {
487
+ "name":"OntoNotes 5",
488
+ "url":"https://catalog.ldc.upenn.edu/LDC2013T19",
489
+ "license":"commercial (licensed by Explosion)",
490
+ "author":"Ralph Weischedel, Martha Palmer, Mitchell Marcus, Eduard Hovy, Sameer Pradhan, Lance Ramshaw, Nianwen Xue, Ann Taylor, Jeff Kaufman, Michelle Franchini, Mohammed El-Bachouti, Robert Belvin, Ann Houston"
491
+ },
492
+ {
493
+ "name":"CoreNLP Universal Dependencies Converter",
494
+ "url":"https://nlp.stanford.edu/software/stanford-dependencies.html",
495
+ "author":"Stanford NLP Group",
496
+ "license":"Citation provided for reference, no code packaged with model"
497
+ }
498
+ ],
499
+ "requirements":[
500
+ "spacy-pkuseg>=0.0.27,<0.1.0"
501
+ ]
502
+ }
ner/cfg ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "moves":null,
3
+ "update_with_oracle_cut_size":100,
4
+ "multitasks":[
5
+
6
+ ],
7
+ "min_action_freq":1,
8
+ "learn_tokens":false,
9
+ "beam_width":1,
10
+ "beam_density":0.0,
11
+ "beam_update_prob":0.0,
12
+ "incorrect_spans_key":null
13
+ }
ner/model ADDED
Binary file (6.73 MB). View file
 
ner/moves ADDED
@@ -0,0 +1 @@
 
 
1
+ οΏ½οΏ½movesοΏ½οΏ½{"0":{},"1":{"GPE":15943,"ORG":15205,"DATE":14256,"PERSON":10912,"CARDINAL":7849,"TIME":2905,"NORP":2685,"EVENT":2602,"MONEY":2519,"LOC":2452,"FAC":2256,"WORK_OF_ART":2014,"QUANTITY":1717,"ORDINAL":1156,"PERCENT":852,"LAW":695,"PRODUCT":486,"LANGUAGE":336},"2":{"GPE":15943,"ORG":15205,"DATE":14256,"PERSON":10912,"CARDINAL":7849,"TIME":2905,"NORP":2685,"EVENT":2602,"MONEY":2519,"LOC":2452,"FAC":2256,"WORK_OF_ART":2014,"QUANTITY":1717,"ORDINAL":1156,"PERCENT":852,"LAW":695,"PRODUCT":486,"LANGUAGE":336},"3":{"GPE":15943,"ORG":15205,"DATE":14256,"PERSON":10912,"CARDINAL":7849,"TIME":2905,"NORP":2685,"EVENT":2602,"MONEY":2519,"LOC":2452,"FAC":2256,"WORK_OF_ART":2014,"QUANTITY":1717,"ORDINAL":1156,"PERCENT":852,"LAW":695,"PRODUCT":486,"LANGUAGE":336},"4":{"GPE":15943,"ORG":15205,"DATE":14256,"PERSON":10912,"CARDINAL":7849,"TIME":2905,"NORP":2685,"EVENT":2602,"MONEY":2519,"LOC":2452,"FAC":2256,"WORK_OF_ART":2014,"QUANTITY":1717,"ORDINAL":1156,"PERCENT":852,"LAW":695,"PRODUCT":486,"LANGUAGE":336,"":1},"5":{"":1}}οΏ½cfgοΏ½οΏ½neg_keyοΏ½
parser/cfg ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "moves":null,
3
+ "update_with_oracle_cut_size":100,
4
+ "multitasks":[
5
+
6
+ ],
7
+ "min_action_freq":30,
8
+ "learn_tokens":false,
9
+ "beam_width":1,
10
+ "beam_density":0.0,
11
+ "beam_update_prob":0.0,
12
+ "incorrect_spans_key":null
13
+ }
parser/model ADDED
Binary file (309 kB). View file
 
parser/moves ADDED
@@ -0,0 +1 @@
 
 
1
+ οΏ½οΏ½movesοΏ½οΏ½{"0":{"":406716},"1":{"":267231},"2":{"advmod":56960,"nsubj":53520,"compound:nn":43919,"dep":40111,"punct":36035,"case":23986,"nmod:assmod":21599,"nmod:prep":20098,"amod":16922,"acl":11979,"conj":10687,"cop":7238,"det":7210,"nummod":6994,"cc":6235,"aux:modal":5566,"nmod:tmod":5335,"nmod":4915,"neg":4363,"xcomp":3881,"appos":2955,"nmod:topic":2410,"discourse":2163,"advmod:loc":1591,"aux:prtmod":1539,"aux:ba":1311,"auxpass":1220,"advmod:dvp":1142,"advcl:loc":1046,"name":1032,"compound:vc":830,"nmod:poss":560,"amod:ordmod":511,"dobj":406,"nsubjpass":263,"nsubj:xsubj||ccomp":62,"parataxis:prnmod":34,"nsubj:xsubj":32},"3":{"punct":74006,"dobj":45383,"conj":30040,"case":30024,"dep":18660,"ccomp":17216,"mark":16600,"mark:clf":11551,"aux:asp":7896,"discourse":3998,"advmod:rcomp":2387,"nmod:range":1885,"cc":1675,"nmod:prep":1595,"advmod":1116,"etc":941,"compound:vc":790,"parataxis:prnmod":693,"advmod:loc":522,"neg":69,"advcl:loc":39,"acl":39},"4":{"ROOT":34525}}οΏ½cfgοΏ½οΏ½neg_keyοΏ½
senter/cfg ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ {
2
+
3
+ }
senter/model ADDED
Binary file (190 kB). View file
 
tagger/cfg ADDED
@@ -0,0 +1,40 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "labels":[
3
+ "AD",
4
+ "AS",
5
+ "BA",
6
+ "CC",
7
+ "CD",
8
+ "CS",
9
+ "DEC",
10
+ "DEG",
11
+ "DER",
12
+ "DEV",
13
+ "DT",
14
+ "ETC",
15
+ "FW",
16
+ "IJ",
17
+ "INF",
18
+ "JJ",
19
+ "LB",
20
+ "LC",
21
+ "M",
22
+ "MSP",
23
+ "NN",
24
+ "NR",
25
+ "NT",
26
+ "OD",
27
+ "ON",
28
+ "P",
29
+ "PN",
30
+ "PU",
31
+ "SB",
32
+ "SP",
33
+ "URL",
34
+ "VA",
35
+ "VC",
36
+ "VE",
37
+ "VV",
38
+ "X"
39
+ ]
40
+ }
tagger/model ADDED
Binary file (14.3 kB). View file
 
tok2vec/cfg ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ {
2
+
3
+ }
tok2vec/model ADDED
Binary file (6.59 MB). View file
 
tokenizer/cfg ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ {
2
+ "segmenter":"pkuseg"
3
+ }
tokenizer/pkuseg_model/features.msgpack ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fd4322482a7018b9bce9216173ae9d2848efe6d310b468bbb4383fb55c874a18
3
+ size 22685181
tokenizer/pkuseg_model/weights.npz ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5ada075eb25a854f71d6e6fa4e7d55e7be0ae049255b1f8f19d05c13b1b68c9e
3
+ size 37508754
tokenizer/pkuseg_processors ADDED
Binary file (4.53 MB). View file
 
vocab/key2row ADDED
@@ -0,0 +1 @@
 
 
1
+ οΏ½
vocab/lookups.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:76be8b528d0075f7aae98d6fa57a6d3c83ae480a8469e668d7b0af968995ac71
3
+ size 1
vocab/strings.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1ac3189551d59dc58dc9f4e3c525da1ec4ad3890362a1efd3ad904d2c698d077
3
+ size 1217934
vocab/vectors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:14772b683e726436d5948ad3fff2b43d036ef2ebbe3458aafed6004e05a40706
3
+ size 128
zh_core_web_sm-any-py3-none-any.whl ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:dad1f66fb6b3981c4986c7203332910a369eb42295ba9bc7ca36a3928ea73fe9
3
+ size 49466044