[Corpora-List] Release 11.0 of the TüBa-D/Z German Treebank

Marie Hinrichs marie.hinrichs at uni-tuebingen.de
Fri Jun 29 16:51:09 CEST 2018


*** The Tübingen Treebank of Written German (TüBa-D/Z) - Final Release 11.0 ***

The Department of Linguistics of the University of Tübingen (Germany) is pleased to announce Release 11.0 of the TüBa-D/Z, a referentially and syntactically annotated German corpus. In addition to the previously released formats, this release also contains the treebank in an automatically converted CoNLL-U format. This will be the FINAL release, although we would like to do manual corrections of the CoNLL-U trees if possible.

This final release is dedicated to and in memory of Dr. Heike Telljohann. The high quality of the treebank is largely owed to her commitment to the project, diligence, and attention to detail over many years.

The TüBa-D/Z treebank is a manually annotated German newspaper corpus based on data taken from the daily issues of 'die tageszeitung'. It currently comprises 3,816 newspaper articles (104,787 sentences; 1,959,474 tokens).

The syntactic annotation scheme of the TüBa-D/Z distinguishes four levels of syntactic constituency (lexical, phrasal, clausal, topological fields) and contains the following annotation layers:

* inflectional morphology * lemmas * syntactic constituency * grammatical functions * (complex) named entities including semantic classification * anaphora and coreference relations * discourse connectives (explicit and implicit, partial coverage) * GermaNet word senses * dependency relations (automatically created) * chunk annotation (automatically created)

New in this Release:

* An additional 172 articles (9,192 sentences; 171,673 tokens) have been annotated. * STYLEBOOK: The annotation stylebook has been updated and can be found on the webpage. * CoNLL-U format, automatically generated

The license for TueBa-D/Z is granted free of charge for scientific use. For more information, please visit the website at: http://www.sfs.uni-tuebingen.de/en/ascl/resources/corpora/tueba-dz.html

Best regards,

Erhard W. Hinrichs Marie Hinrichs

------------ Dept. of Computational Linguistics University of Tübingen Wilhelmstr. 19 72074 Tübingen Germany



More information about the Corpora mailing list