[Corpora-List] CFP (deadline extension: April 22) 1st IWPT Shared Task on Enhanced Universal Dependencies Parsing

djame seddah djame.seddah at gmail.com
Fri Mar 27 19:27:32 CET 2020

(apologies for cross-posting)

Dear everyone, Given the current circumstances and the burden on everyone, the system submission deadlines of the 1st IWTP Shared Task on Enhanced Dependencies Parsing have been extended to Wednesday, April 22. Everyone interested is more than welcome to participate.

Important dates*:

• February 5: training data + eval script • April 1: blind test data available • April 22: system submission deadline • May 6: system description papers • May 18: camera-ready papers • July 9: IWPT at ACL 2020

Website: https://universaldependencies.org/iwpt20/

contact: iwptsharedtask at gmail.com

registration: Subscribe to the mailing-list at: https://sympa.inria.fr/sympa/info/iwptsharedtask

============ IWPT 2020 EUD SHARED TASK ============ the IWPT 2020 conference (https://iwpt20.sigparse.org), collocated with ACL 2020, hosts the 1st Shared Task on Enhanced Dependencies Parsing

===== Summary ===== Following the success of the first CoNLL Universal Dependencies Parsing shared tasks 2017 and 2018 and the Semeval Semantic Dependency Parsing 2014 and 2015, a special shared task is launched with this year an emphasis on the parsing of Enhanced Universal Dependencies (often reflecting deeper syntactic structures, represented as more complex graphs, than regular surface dependencies).

Webpage: https://universaldependencies.org/iwpt20/ test submission deadline: April 22, 2020 (23h59, GMT-12,"anywhere on Earth ») (test data release: April 1st, 2020)

Interested parties are encouraged to subscribe to the shared task mailing list at http://sympa.inria.fr/sympa/info/iwptsharedtask.

===== Introduction ===== The IWPT 2020 Shared Task will be on Multilingual Parsing into Enhanced Universal Dependencies (EUD). In recent years, Universal Dependencies (UD)—the de-facto standard target representations in surface-syntactic dependency parsing—have grown a second layer of structure, called enhanced dependencies, where grammatical relations that cannot be adequately represented in pure rooted trees are encoded, for example control relations and argument sharing in relative clauses, shared dependencies involving coordinate structures, and dependencies involving ellipsis. Enhanced dependencies call for non-tree graphs with reentrancies, cycles, and empty nodes.

Data for the shared task consists of at least the treebanks in UD release 2.5. that contain enhanced annotation, and potentially one or more additional languages/treebanks. The task will be parsing from raw strings into EUD according to the guidelines at https://universaldependencies.org/u/overview/enhanced-syntax.html. On top of a classic F-measure metrics, evaluation will measure performance per phenomenon and will take into account the fact that not all treebanks cover all of the phenomena listed in the EUD guidelines.

===== Task Description and Evaluation =====

We invite participants to develop a system for parsing raw text into enhanced universal dependencies for all of the languages included in the training data. The task is similar to that of the CoNLL 2017 and 2018 shared tasks on parsing into UD, except that the prime evaluation metric now is the enhanced dependency annotation. Participants are encouraged to consider all enhancements listed in the UD guidelines, even if some of these enhancements might be absent in some of the treebanks included in the training data. Evaluation will take into account the fact that some treebanks are incomplete in this respect. Participants are also encouraged to predict all lower levels of annotation (lemma, tag, morphological features, basic dependency tree). These annotations will be evaluated as secondary metrics. Also, it is possible to train a pre-existing parser (such as UDPipe), use it to predict the lower levels of annotation and then develop one’s own system that focuses on the transition from the basic UD tree to the enhanced UD graph.

== Training Data == The evaluation will be done on 17 languages from 4 language families: Arabic, Bulgarian, Czech, Dutch, English, Estonian, Finnish, French, Italian, Latvian, Lithuanian, Polish, Russian, Slovak, Swedish, Tamil, Ukrainian. The language selection is driven simply by the fact that at least partial enhanced representation is available for the given language. Training and development data in the CoNLL-U format are available on the shared task website. These datasets are based on the UD release 2.5 but the annotation is often not identical to the corresponding treebank in UD 2.5. Nevertheless, the participants are also allowed to use the training and development data from the official UD 2.5 release package on Lindat, even in languages that are not part of the shared task evaluation. No other version of UD (either previous releases or Github repositories or other copies and clones online) can be used in the shared task; this is to avoid the danger of incompatible training-test splits.

== Evaluation Metric == The prime evaluation metric is LAS on enhanced dependencies (DEPS), where LAS is defined as F1-score over the set of enhanced dependencies in the system output and the gold standard. Complete edge labels are taken into account, i.e. conj:and differs from conj.

While some effort has gone into ensuring that the data in the various treebanks is annotated consistently w.r.t. the level of enhancements and the format of enhanced labels, not all treebanks include all of the enhancements listed in the UD guidelines. For those treebanks, an additional evaluation will be carried out, where dependencies that are the consequence of including enhancement E, where E is not included in the training data of that treebank, are ignored during evaluation.

===== Shared Task Schedule =====

Release of training and dev. data February 5 Release of blind test data March 18 Deadline for submission of parsed test data April 1 Announcement of results April 2 Shared task papers due (provisional) April 10 Camera ready papers due May 18 Presentation of results at IWPT, Seattle July 9

Shared Task papers will be published in the "Working notes of the IWPT 2020 Enhanced Dependency Parsing Shared Task".

===== Shared task Organizers =====

Gosse Bouma (University of Groningen, Netherlands) Djamé Seddah (Inria Paris, France) Daniel Zeman (Charles University, Czechia)

===== IWPT 2020 Organizers =====

Kenji Sagae (General Chair) Weiwei Sun (Programme Co-Chair) Anders Søgaard (Programme Co-Chair) Stephan Oepen (Publicity Chair) Yuji Matsumoto Reut Tsarfaty

===== Contact details ===== - Mail: iwptsharedtask at gmail.com - Webpage: https://universaldependencies.org/iwpt20/ - Mailing list: https://sympa.inria.fr/sympa/info/iwptsharedtask - IWPT 2020 website: https://iwpt20.sigparse.org

More information about the Corpora mailing list