LatinCy

Synthetic trained spaCy pipelines for Latin NLP

Developed by Patrick J. Burns, 2023.

Paper

Details about training, datasets, etc. can be found in the following paper: Burns, P.J. 2023. “LatinCy: Synthetic Trained Pipelines for Latin NLP.” https://arxiv.org/abs/2305.04365v1.

Citation

@misc{burns_latincy_2023,
    title = {{LatinCy}: Synthetic Trained Pipelines for Latin {NLP}},
    author = {Burns, Patrick J.},
    url = {https://arxiv.org/abs/2305.04365v1},
    shorttitle = {{LatinCy}},
    abstract = {This paper introduces {LatinCy}, a set of trained general purpose Latin-language "core" pipelines for use with the {spaCy} natural language processing framework. The models are trained on a large amount of available Latin data, including all five of the Latin Universal Dependency treebanks, which have been preprocessed to be compatible with each other. The result is a set of general models for Latin with good performance on a number of natural language processing tasks (e.g. the top-performing model yields {POS} tagging, 97.41\% accuracy; lemmatization, 94.66\% accuracy; morphological tagging 92.76\% accuracy). The paper describes the model training, including its training data and parameterization, and presents the advantages to Latin-language researchers of having a {spaCy} model available for {NLP} work.},
    date = {2023-05-07},
    langid = {english},
}

LatinCy

AI & ML interests

LatinCy

Paper

Citation

spaces 1

Latincy Dashboard

models 7

latincy/la_core_web_md

latincy/la_core_web_lg

latincy/la_core_web_trf

latincy/la_core_web_sm

latincy/latinbert2

latincy/la_vectors_floret_md

latincy/la_vectors_floret_lg

datasets

AI & ML interests

Team members 6

LatinCy

Paper

Citation

spaces 1

Latincy Dashboard

models 7 Sort: Recently updated

datasets

models 7