[Corpora-List] Release 2.0 of GenitivDB - Database for German Genitive Classification

Roman Schneider schneider at ids-mannheim.de
Tue Sep 29 12:21:46 CEST 2015


RELEASE 2.0 OF GENITIVDB - DATABASE FOR GERMAN GENITIVE CLASSIFICATION

We are pleased to announce release 2.0 of GenitivDB - the first database for German Genitive Classification. It is available for public access online; the underlying dataset can be downloaded for scientific purposes.

GenitivDB is a novel NLP resource for the explanation of linguistic phenomena, built and evaluated exploring very large annotated language corpora. It can be used for the notoriously controversial classification and prediction of German genitive endings (short endings, long endings, zero-marker).

For its compilation, we used the DeReKo Reference Corpus, which is the largest linguistic resource worldwide for the study of written German. The corpus data served as a basis to extract all relevant genitive forms. After several refinements, the resulting collection comprises 650,726 types and 9,541,753 tokens. All findings are enriched with linguistic metadata (morphosyntactic information, phonetic and prosodic data, context information, etc.) as well as extra-linguistic metadata (year of publication, country/region of origin, media type, thematical domain. etc.), for a total of more than 80 different metadata types.

NEW FEATURES OF THE GENITIVDB 2.0 DATASET ARE:

* toponym identification as additional metadata type * improved identification of proper nouns * improved identification of time expressions * adjusted score points (genitive probability value) * various minor corrections (assignment of genitive endings, handling of zero-markers, etc.)

NEW FEATURES OF THE GENITIVDB ONLINE FORM ARE:

* additional search options (metadata types) * computation of data distribution * statistical exploration and visualization via R-based statistics tool

ONLINE ACCESS AND DOWNLOAD

http://www.ids-mannheim.de/genitivdb/

CITATION

Bubenhofer, Noah / Hansen, Sandra / Konopka, Marek / Schneider, Roman (2015): GenitivDB 2.0 - Datenbank zur Genitivmarkierung (Release vom 01.09.2015). Mannheim: Institut für Deutsche Sprache. http://www.ids-mannheim.de/genitivdb

Please tell us whenever you publish work based on GenitivDB: grammis at ids-mannheim.de

-- Dr. Roman Schneider Institut für Deutsche Sprache, R5 6-13, D-68161 Mannheim Tel: +49(621) 1581-419 / Fax: +49(621) 1581-200 http://www.ids-mannheim.de/gra/personal/schneider.html



More information about the Corpora mailing list