[Corpora-List] Anselm Corpus 1.0 released

Stefanie Dipper dipper at linguistics.rub.de
Thu Dec 20 18:41:23 CET 2018


We are happy to announce the public release of the Anselm Corpus, which is available from the following website:

https://www.linguistics.ruhr-uni-bochum.de/anselm/access/index.en.html

"Interrogatio Sancti Anselmi de Passione Domini" ('Questions by Saint Anselm about the Lord’s Passion') is a medieval religious treatise which is documented in an exceptionally broad number of written records. In total, there are around 70 German manuscripts and prints written up between the 14th and 16th centuries. The Anselm Corpus consists of 58 texts with 400,000 tokens in total.

In the texts, Anselm of Canterbury asks questions to the Virgin Mary concerning the Passion of Jesus Christ. She answers him in the form of longer monologues. While the texts have comparable content, logical structure, and even (semi-)parallelity in sentence structure and wording, they do not follow fixed spelling conventions and show dialectal variations in graphematics, phonology, morphology, and syntax. This makes the corpus a highly interesting resource for comparative investigations in different areas such as linguistics, history or theology.

The transcriptions of the texts comprise two separate layers. The diplomatic layer records historical graphemes and conserves original word boundaries. Layout information, such as page or line breaks, refers to this layer. The second layer adapts word boundaries to the conventions of modern German and serves as the basis for all further linguistic annotations. The texts have been annotated with a normalized and a modernized wordform, part-of-speech tags (using a slightly modified version of the STTS tagset), morphology, and lemma. For detailed documentation, see https://www.linguistics.ruhr-uni-bochum.de/comphist/projects/anselm/.

The corpus can be accessed via ANNIS under the following URL:

https://www.linguistics.rub.de/annis/annis3/anselm/

The corpus is licensed under the Creative Commons Attribution-ShareAlike 3.0 license (CC BY-SA 3.0), and can also be downloaded in an XML format from our website.



More information about the Corpora mailing list