datasets: | |
- conll2012_ontonotesv5 | |
language: | |
- en | |
pipeline_tag: text2text-generation | |
Given a text, its output format is: `"{ENT_TYPE}:{span}; {ENT_TYPE}:{span}..."`\ | |
For training speed, we only use the first 10,000 sentences (not documents) from train set; 1,000 sentences from validation set;\ | |
we save the model when its val_loss (NLL) reaches the minimum.\ | |
The model could be used as a pretrained backbone on downstream fine-tuning NER tasks. | |