Commit
•
0ac7ea5
1
Parent(s):
a58b801
トークナイザーに全角記号・数字のNKFC正規化を追加 (#2)
Browse files- トークナイザーに全角記号・数字のNKFC正規化を追加 (1dc9fb6bc4d845bf6b52587914a953501cad8509)
Co-authored-by: Kanta Hayashi <[email protected]>
- tokenizer.json +3 -0
tokenizer.json
CHANGED
@@ -124,6 +124,9 @@
|
|
124 |
"normalizer": {
|
125 |
"type": "Sequence",
|
126 |
"normalizers": [
|
|
|
|
|
|
|
127 |
{
|
128 |
"type": "Replace",
|
129 |
"pattern": {
|
|
|
124 |
"normalizer": {
|
125 |
"type": "Sequence",
|
126 |
"normalizers": [
|
127 |
+
{
|
128 |
+
"type": "NFKC"
|
129 |
+
},
|
130 |
{
|
131 |
"type": "Replace",
|
132 |
"pattern": {
|