[Corpora-List] First CfP: Shared Task on the Disambiguation of German Verbal Idioms at KONVENS 2021

Timm Lichte timm.lichte at uni-tuebingen.de
Fri Apr 30 18:02:13 CEST 2021


[apologies for cross-postings]

Shared Task on the Disambiguation of German Verbal Idioms at KONVENS 2021

https://github.com/rafehr/vid-disambiguation-sharedtask

First call for participation

The shared task on the disambiguation of German verbal idioms (VIDs) aims to disambiguate instances of a pre-selected set of German VIDs from their literal counterparts, e.g. /Handy fiel ins Wasser/ ('The mobile phone fell into the water') vs. /Das Konzert fiel ins Wasser/ ('The concert was cancelled'). This kind of disambiguation is an implicit or explicit step during VID identification and it is a well-known challenge for NLP applications like parsing or machine translation. The caveat is that literal readings of such expressions are quite rare relative to idiomatic ones, so one of our goals was to alleviate this issue by providing a corpus with a lower than usual idiomaticity rate. This allows for the training and evaluation of classifiers able to disambiguate VIDs from their literal counterparts.

Shared task website:

Besides our official GitHub repository (https://github.com/rafehr/vid-disambiguation-sharedtask) we will use CodaLab for the shared task. The CodaLab site will go online with the publication of the training data (May 15, 2021) at the latest. We will add a link to the website on our GitHub repository and on the KONVENS 2021 website (https://konvens2021.phil.hhu.de/shared-tasks/) as soon as it is ready. On the CodaLab site you will find all the information needed to participate.

#### Publication

Shared task participants will be invited to submit a system description paper which, upon acceptance, will be published in the shared task proceedings on konvens.org. Their acceptance depends on the quality of the paper rather than on the results obtained in the shared task.

#### Data

The shared task data consists of 9906 instances of a set of German VID types or their literal counterparts in context. The set of VID types was pre-selected, thus it constitutes a lexical sample data set. It is a merger of the COLF-VID (https://www.aclweb.org/anthology/2020.figlang-1.29.pdf) and the German SemEval-2013 task 5b data set (https://www.aclweb.org/anthology/S13-2007.pdf). The data will be uploaded to our official GitHub repository: https://github.com/rafehr/vid-disambiguation-sharedtask. The trial data has already been published. The training data will be released May 15 and the test data will be ready June 23 which also marks the start of the evaluation phase.

#### Evaluation

Participating teams will be required to submit the test data with the predictions made by their systems, thus they don't have to submit their systems, but their results. These will be compared to the gold data. The evaluation will focus on the minority class of literal readings. Furthermore, we will include unseen VID types in the dev and test set to challenge the systems generalization capabilities. As mentioned above, we will use CodaLab for the shared task. Hence, the predictions made by the systems will be submitted to the CodaLab site where they will be automatically evaluated.

#### Important Dates

- Trial data ready: April 23, 2021 - Training data ready: May 15, 2021 - Test data ready: June 23, 2021 - Evaluation end: June 30, 2021 - Paper submission due: July 15, 2021 - Camera ready due: August 10, 2021 - KONVENS 2021: September 6-10, 2021

#### Organizing Team

Rafael Ehren, Laura Kallmeyer, Timm Lichte and Jakub Waszczuk Contact: vid.disambiguation2021 at gmail.com

-- Dr. Timm Lichte Coordinator Department of Computer Science University of Tübingen http://informatik.uni-tuebingen.de phone +49 (0)7071 29-70423 fax +49 (0)7071 29-571



More information about the Corpora mailing list