This shared task focuses on multiword expressions (MWEs) and, in addition to
the MWE community, Subtask B might be of particular interest to those working
on language models and semantic text similarity.
All participating teams will be invited to submit a task description paper in the proceedings published by ACL.
================================================ FINAL CALL FOR PARTICIPATION
SemEval 2022 Task 2 Multilingual Idiomaticity Detection and Sentence Embedding
Task Page: https://sites.google.com/view/semeval2022task2-idiomaticity
Google Group: https://groups.google.com/g/semeval-2022-task-2-mwe
This SemEval 2022 Task is aimed to encourage the development of methods aimed at better identification and representation of Idiomatic Multiword Expressions (MWEs).
By and large, the use of compositionality of word representations has been successful in capturing the meaning of sentences. However, there is an important set of phrases — those which are idiomatic — which are inherently not compositional. Early attempts to represent idiomatic phrases in non-contextual embeddings involved the extraction of frequently occurring n-grams from text (such as “big fish”) before learning representations of the phrase based on their context. However, the effectiveness of this method drops off significantly as the length of the idiomatic phrase increases as a result of data sparsity. More recent studies show that even state-of-the-art pre-trained contextual models (e.g. BERT) cannot accurately represent idiomatic expressions.
Given this shortcoming in existing state-of-the-art models, this task (part of SemEval 2022) is aimed at detecting and representing multiword expressions (MWEs) which are potentially idiomatic phrases across English, Portuguese and Galician. This task consists of two subtasks, each available in two "settings".
Participants have the freedom to choose a subset of subtasks or settings that they'd like to participate in (see sections detailing each of the subtasks for details). You cannot pick a subset of languages.
This task consists of two subtasks: Subtask A
A binary classification task aimed at determining whether a sentence contains an idiomatic expression.
This novel subtask requires models to output the correct Semantic Text Similarity (STS) scores between sentence pairs whether or not either sentence contains an idiomatic expression. Participants must submit STS scores which range between 0 (least similar) and 1 (most similar). This will require models to correctly encode the meaning of idiomatic phrases such that the encoding of a sentence containing an idiomatic phrase (e.g. Who will he start a program with and will it lead to his own *swan song*?) and the same sentence with the idiomatic phrase replaced by a (literal) paraphrase (e.g. Who will he start a program with and will it lead to his own *final performance*?) are semantically similar to each other and equally similar to any other sentence.
[NOW AVAILABLE] Training data available: September 3, 2021
[NOW AVAILABLE] Evaluation data released: January 10, 2022
Evaluation end: January 31, 2022
Paper submissions due: February 23, 2022
Notification to authors: March 31, 2022
Harish Tayyar Madabushi, University of Sheffield, UK.
Edward Gow-Smith, University of Sheffield, UK.
Marcos Garcia, Universidade de Santiago de Compostela, Spain
Carolina Scarton, University of Sheffield, UK.
Marco Idiart, Federal University of Rio Grande do Sul, Brazil.
Aline Villavicencio, University of Sheffield, UK.
For more information, see: https://sites.google.com/view/semeval2022task2-idiomaticity
Dr Harish Tayyar Madabushi, Postdoctoral Researcher University of Sheffield Western Bank Sheffield S10 2TN
e (o): H.TayyarMadabushi at sheffield.ac.uk w: https://www.harishtayyarmadabushi.com t: @harish -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 17453 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20220117/a937ce80/attachment.txt>