[Corpora-List] SemEval 2022 Shared Task 1 - CoDWoE: Comparing Dictionaries and Word Embeddings

Timothee Mickus tmickus at atilf.fr
Thu Sep 23 15:14:48 CEST 2021


[Apologies for cross-posting]

Do you work with text generation or word embeddings? We invite everyone to explore if a word embedding can be transformed into a short informative text (word definition, or gloss) and vice versa.

The CODWOE, Task 1 at SemEval 2022, aims to compare two types of semantic descriptions: dictionary definitions and word embedding representations. Are these two types of representation equivalent? Can we generate one from the other? To study this question, we propose two subtracks: a definition modeling track where participants have to generate definitions from vectors, and a reverse dictionary track where participants have to generate vectors from definitions. The tasks are available for English, Spanish, French, Italian and Russian.

These two tracks display a number of interesting characteristics. These tasks are obviously useful for explainable AI , since they involve converting human-readable data into machine-readable data and back. They also have a theoretical significance : both definitions and word embeddings are also representations of meaning, and therefore involve the conversion of distinct non-formal semantic representations. From a practical point of view, the ability to infer word-embeddings from dictionary resources, or dictionaries from large unannotated corpora, would prove a boon for many under-resourced languages .

Here are the key dates participants should keep in mind:

*

September 3, 2021: Training data & development data made available

*

January 10, 2022: Evaluation data made available & evaluation start

*

January 31, 2022: Evaluation end

*

February 23, 2022: Paper submission due

*

March 31, 2022: Notification to authors

To get started:

*

register on the codalab competition: [ https://competitions.codalab.org/competitions/34022 | https://competitions.codalab.org/competitions/34022 ]

*

join the discord server : [ https://discord.gg/y8g6qXakNs | https://discord.gg/y8g6qXakNs ]

*

join the google group: send an email to semeval2022-dictionaries-and-word-embeddings+subscribe at googlegroups.com

*

download the competition data and starter code: [ https://git.atilf.fr/tmickus/codwoe/-/tree/master/ | https://git.atilf.fr/tmickus/codwoe/-/tree/master/ ]

Best regards,

CoDWoE organizers:

Timothee Mickus, ATILF, University of Lorraine / CNRS

Kees van Deemter, Utrecht University

Mathieu Constant, ATILF, University of Lorraine / CNRS

Denis Paperno, Utrecht University

-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 29631 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20210923/4599740e/attachment.txt>



More information about the Corpora mailing list