[Corpora-List] CorA (corpus annotator): first public release

Marcel Bollmann bollmann at linguistics.rub.de
Sat Jul 8 10:41:36 CEST 2017


Dear colleagues,

we are happy to announce the first public release of our corpus annotation tool CorA!

https://github.com/comphist/cora

CorA is a web-based tool for token-level annotation of texts, intended particularly for historical and other non-standard language data. Some of its features include:

* Annotation of normalized wordforms, parts-of-speech, morphology, and

lemma

* Editing the source data (e.g., to correct mistakes in a

transcription, or to change token boundaries)

* Calling external tools for automatic (pre-)annotation directly from

the web interface

A comprehensive user documentation is available here:

https://cora.readthedocs.io/

Originally developed for the annotation of historical corpora of Early New High German[1][2], CorA has since been used in a variety of other projects, including the annotation of social media data.[3] Due to this widespread interest, we have decided to prepare this official release and make it available under the permissive MIT license.

For further questions, feel free to contact Marcel Bollmann <bollmann at linguistics.rub.de>.

[1] https://www.linguistics.rub.de/anselm/ [2] https://www.linguistics.rub.de/comphist/projects/ref/ [3] https://sites.google.com/site/empirist2015/home/shared-task-data

-- Marcel Bollmann, M.A. Sprachwissenschaftliches Institut Ruhr-Universitšt Bochum - 44780 Bochum - Germany tel: +49 (0)234 32-22481 www: https://marcel.bollmann.me/



More information about the Corpora mailing list