WORKSHOP ON PARSING GERMAN
June 19 or 20, 2008
1st CALL FOR PAPERS
German possesses an interesting set of configurational properties on the syntactic level which make it far less flexible with respect to word order than other free word order languages. Analyses of these properties, which have formed a part of the traditional syntax of German since the early 19th century, only re-entered the mainstream of generative linguistics research within the last twenty years or so. In computational linguistics, however, their realization has varied quite widely: "topological fields" in HPSG-style analyses, multiple parse trees, special constraints on liberation in constraint-based dependency-style analyses, various hybrid "deep/shallow" approaches, and agnostic parameter estimation over graphs. This variation can also acutely be felt in the annotation of German treebanks. Many corpora have historically elected to annotate only a few of the different senses of the term "constituent" inherent to German syntax, resulting in standards that make German appear either more like English or more like Czech.
The aim of this workshop is to provide a forum for theoretical discussion as well as a shared task, based on the TIGER and TueBa-D/Z German treebanks, for these various approaches to make their case on empirical grounds. This combination we believe to be essential to balancing the considerations of what structure merits learning versus the ease with which it can be learned. Both treebanks are annotated collections of German newspaper text on similar topics. They are annotated with POS, morphology, phrase structure, and grammatical functions. TueBa-D/Z additionally uses topological fields to describe fundamental word order restrictions in German clauses. The treebanks differ significantly in their annotation schemes, however: while TIGER relies on crossing branches to describe long distance relationships, TueBa-D/Z uses pure tree structures with designated labels for long distance relationships. Additionally, the annotation is TIGER is flat on the phrasal level while TueBa-D/Z annotates phrasal structure more hierarchically.
* constituent based approaches to parsing German * dependency based approaches to parsing German * treatment of long-distance relationships in German * comparisons of parsing results for German to other free word order languages
The workshop will feature a shared task on parsing German. We will provide the following data sets:
* TIGER in constituent structure * TIGER in dependency structure * TueBa-D/Z in constituent structure * TueBa-D/Z in dependency structure
The task will be to parse both treebanks using one structural encoding. The final ranking of systems will be based on averages computed between both treebanks. The data sets will be made available free of charge for the shared task, but they do require a license.
In order to take part in the shared task, participants should register their intent to participate by sending an email to skuebler at indiana.edu. More information will be made available to registered participants.
Release of training data: February 5, 2008 Release of test data: March 5, 2008 Submission of test results: March 10, 2008 Evaluation results available: March 12, 2008
Workshop Paper Submission deadline: March 17, 2008 Notifications sent to authors: April 4, 2008 Camera ready due: April 18, 2008 Workshop Dates: June 19 or 20, 2008
PAPER SUBMISSION INFORMATION
Submissions will consist of regular full papers of max. 8 pages, formatted following the ACL 2008 main session guidelines. In addition, shared task participants will be invited to submit short papers (max. 4 pages) describing their systems and/or their evaluation metrics. Both submission and review processes will be handled via the START system.
Berthold Crysman, Bonn Amit Dubey, Edinburgh Anette Frank, Heidelberg Erhard Hinrichs, Tuebingen Julia Hockenmeier, Illinois Laura Kallmeyer, Tuebingen Frank Keller, Edinburgh Sandra Kuebler (co-chair) Wolfgang Menzel, Hamburg Stefan Mueller, Berlin Stefan Oepen, Oslo Gerald Penn (co-chair) Helmut Schmid, Stuttgart Gerold Schneider, Zuerich Hans Uszkoreit, Saarbruecken Josef van Genabith, Dublin
Sandra Kuebler Indiana University skuebler at indiana.edu
Gerald Penn University of Toronto gpenn at cs.toronto.edu