*** The Tübingen Treebank of Written German (TüBa-D/Z) - Release 10.0 ***
The Department of Linguistics of the University of Tübingen (Germany) is pleased to announce Release 10.0 of the TüBa-D/Z, a referentially and syntactically annotated German corpus.
The TüBa-D/Z treebank is a manually annotated German newspaper corpus based on data taken from the daily issues of 'die tageszeitung'. It currently comprises 3,644 newspaper articles (95,595 sentences; 1,787,801 tokens).
The syntactic annotation scheme of the TüBa-D/Z distinguishes four levels of syntactic constituency (lexical, phrasal, clausal, topological fields) and contains the following annotation layers:
* inflectional morphology * lemmas * syntactic constituency * grammatical functions * (complex) named entities including semantic classification * anaphora and coreference relations * discourse connectives (explicit and implicit, partial coverage) * GermaNet word senses * dependency relations (automatically created) * chunk annotation (automatically created)
New in this Release:
* An additional 200 articles (10,237 sentences; 217,885 tokens) have been annotated. * STYLEBOOK: The annotation stylebook has been updated and can be found on the webpage. * Also included (since minor Release 9.1) are 17,910 manual annotations of a selected set of lemmas (30 nouns, 79 verbs) with their corresponding senses in the German wordnet GermaNet with the goal of providing a gold standard for word sense disambiguation.
The license for TueBa-D/Z is granted free of charge for scientific use. For more information, please visit the website at: http://www.sfs.uni-tuebingen.de/en/ascl/resources/corpora/tueba-dz.html
Erhard W. Hinrichs Heike Telljohann Marie Hinrichs
------------ Dept. of Computational Linguistics University of Tübingen Wilhelmstr. 19 72074 Tübingen Germany