[CFP] (updated deadlines and data set) 2nd IWPT Shared Task on Enhanced Universal Dependencies Parsing

============== IWPT 2021 EUD SHARED TASK ============== the IWPT 2021 conference (https://iwpt21.sigparse.org), collocated with ACL 2021, hosts the 2nd Shared Task on Enhanced Dependencies Parsing

** updates ** - schedule corrected (now synced with website) - a new data set is included (English Gum Treebank) - registration via the shared-task mailing list

web: https://universaldependencies.org/iwpt21/

===== Summary ===== Following the success of the 1st IWPT Enhanced Universal Dependency Parsing in 2020, a second edition of the IWPT Parsing Shared task is launched with a focus an emphasis on the parsing of Enhanced Universal Dependencies (often reflecting deeper syntactic structures, represented as more complex graphs, than regular surface dependencies).

https://iwpt21.sigparse.org/sharedtask Webpage: https://universaldependencies.org/iwpt21/

Release of training and dev. data : April 6 Test data release: May 1 Test submission deadline: May 22 System paper deadline: June 6 Camera Ready Deadline: June 30 Timezone: Anytime on Earth (CET-12)

Interested parties are encouraged to subscribe to the shared task mailing list at http://sympa.inria.fr/sympa/info/iwptsharedtask.

===== Introduction =====

The IWPT 2021 Shared Task will be on Multilingual Parsing into Enhanced Universal Dependencies (EUD). In recent years, Universal Dependencies (UD)—the de-facto standard target representations in surface-syntactic dependency parsing—have grown a second layer of structure, called enhanced dependencies, where grammatical relations that cannot be adequately represented in pure rooted trees are encoded, for example control relations and argument sharing in relative clauses, shared dependencies involving coordinate structures, and dependencies involving ellipsis. Enhanced dependencies call for non-tree graphs with reentrancies, cycles, and empty nodes.

Data for the shared task consists of at least the treebanks in UD release 2.5. that contain enhanced annotation, and potentially one or more additional languages/treebanks. The task will be parsing from raw strings into EUD according to the guidelines at https://universaldependencies.org/u/overview/enhanced-syntax.html. On top of a classic F-measure metrics, evaluation will measure performance per phenomenon and will take into account the fact that not all treebanks cover all of the phenomena listed in the EUD guidelines.

===== Task Description and Evaluation =====

We invite participants to develop a system for parsing raw text into enhanced universal dependencies for all of the languages included in the training data. The task is similar to that of the CoNLL 2017 and 2018 shared tasks on parsing into UD, except that the prime evaluation metric now is the enhanced dependency annotation. Participants are encouraged to consider all enhancements listed in the UD guidelines, even if some of these enhancements might be absent in some of the treebanks included in the training data. Evaluation will take into account the fact that some treebanks are incomplete in this respect. Participants are also encouraged to predict all lower levels of annotation (lemma, tag, morphological features, basic dependency tree). These annotations will be evaluated as secondary metrics. Also, it is possible to train a pre-existing parser (such as UDPipe), use it to predict the lower levels of annotation and then develop one’s own system that focuses on the transition from the basic UD tree to the enhanced UD graph.

== Training Data ==

The evaluation will be done on 17 languages from 4 language families: Arabic, Bulgarian, Czech, Dutch, English, Estonian, Finnish, French, Italian, Latvian, Lithuanian, Polish, Russian, Slovak, Swedish, Tamil, Ukrainian. The language selection is driven simply by the fact that at least partial enhanced representation is available for the given language. Training and development data in the CoNLL-U format are available on the shared task website. These datasets are based on the UD release 2.7 but the annotation is often not identical to the corresponding treebank in UD 2.7. Nevertheless, the participants are also allowed to use the training and development data from the official UD 2.7 release package on Lindat, even in languages that are not part of the shared task evaluation. No other version of UD (either previous releases or Github repositories or other copies and clones online) can be used in the shared task; this is to avoid the danger of incompatible training-test splits.

== Evaluation Metric ==

The prime evaluation metric is LAS on enhanced dependencies (DEPS), where LAS is defined as F1-score over the set of enhanced dependencies in the system output and the gold standard. Complete edge labels are taken into account, i.e. conj:and differs from conj. While some effort has gone into ensuring that the data in the various treebanks is annotated consistently w.r.t. the level of enhancements and the format of enhanced labels, not all treebanks include all of the enhancements listed in the UD guidelines. For those treebanks, an additional evaluation will be carried out, where dependencies that are the consequence of including enhancement E, where E is not included in the training data of that treebank, are ignored during evaluation.

===== Shared Task Schedule =====

Shared Task papers will be published in the "Working notes of the IWPT 2021 Enhanced Dependency Parsing Shared Task".

===== Shared task Organizers =====

Gosse Bouma (University of Groningen, Netherlands) Djamé Seddah (Inria Paris, France) Daniel Zeman (Charles University, Czechia)

===== IWPT 2021 Organizers =====

Yuji Matsumoto Stephan Oepen Kenji Sagae Weiwei Sun Anders Søgaard Reut Tsarfaty

===== Contact details ===== - Mail: iwptsharedtask at gmail.com - Webpage: https://universaldependencies.org/iwpt21/ - Mailing list: https://sympa.inria.fr/sympa/info/iwptsharedtask - IWPT 2020 website: https://iwpt21.sigparse.org

