[Corpora-List] New Release of the TüBa-D/Z German Treebank

Marie Hinrichs marie.hinrichs at uni-tuebingen.de
Fri Dec 19 11:48:47 CET 2014


The Department of Linguistics of the University of Tübingen (Germany) is pleased to announce a new minor release of its referentially and syntactically annotated German corpus:

The Tübingen Treebank of Written German (TüBa-D/Z) - Release 9.1

******************************************************************

The TüBa-D/Z treebank is a manually annotated German newspaper corpus based on data taken from the daily issues of the 'die tageszeitung'. It currently comprises 85,358 sentences (1,569,916 words; 3,444 newspaper articles).

This minor release includes 17,910 manual annotations of a selected set of lemmas (30 nouns, 79 verbs) with their corresponding senses in the German wordnet GermaNet with the goal of providing a gold standard for word sense disambiguation. Please note that no new sentences have been added between release 9.0 and release 9.1. Only those formats that support word sense annotation are part of this minor release (Negra Export 3 and 4, CoNLL 2011/2012, Export XML). Other formats remain unchanged and can be obtained from release 9.0.

The syntactic annotation scheme of the TüBa-D/Z distinguishes four levels of syntactic constituency (lexical, phrasal, clausal, topological fields) and contains the following annotation layers:

* inflectional morphology * lemmas * syntactic constituency * grammatical functions * (complex) named entities including semantic classification * anaphora and coreference relations * discourse connectives (explicit and implicit, partial coverage) * GermaNet word senses * dependency relations (automatically created) * chunk annotation (automatically created)

The license for TueBa-D/Z is granted free of charge for scientific use. For more information, please visit the website at: http://www.sfs.uni-tuebingen.de/en/ascl/resources/corpora/tueba-dz.html

Best Regards,

Erhard W. Hinrichs Heike Telljohann Marie Hinrichs

------------ Dept. of Computational Linguistics University of Tübingen Wilhelmstr. 19 72074 Tübingen Germany



More information about the Corpora mailing list