[Corpora-List] PARSEME Shared Task 1.1 - final call for participation and DEADLINE EXTENSION

Carlos Ramisch carlinhosramisch at gmail.com
Thu May 3 21:59:07 CEST 2018

*** FINAL CALL FOR PARTICIPATION**Shared task on automatic identification of verbal multiword expressions – edition 1.1http://multiword.sourceforge.net/sharedtask2018 <http://multiword.sourceforge.net/sharedtask2018>DEADLINE EXTENDED: MAY 8, 2018=======================================================================*Apologies for cross-posting*The second edition of the PARSEME shared task on automatic identification of verbal multiword expressions (VMWEs) aims at identifying verbal MWEs in running texts. Verbal MWEs include, among others, idioms (*to let the cat out of the bag*), light verb constructions (*to make a decision*), verb-particle constructions (*to give up*), multi-verb constructions (*to make do*) and inherently reflexive verbs (*se suicider* 'to suicide' in French). Their identification is a well-known challenge for NLP applications, due to their complex characteristics including discontinuity, non-compositionality, heterogeneity and syntactic variability.We ask potential participant teams to register using the expression of interest form:https://docs.google.com/forms/d/e/1FAIpQLSd6L8IntkNKXbMp8QVLLvCYzzhoH-_8ovSW0DL3BtYGNnsFhA/viewform?c=0&w=1 <https://docs.google.com/forms/d/e/1FAIpQLSd6L8IntkNKXbMp8QVLLvCYzzhoH-_8ovSW0DL3BtYGNnsFhA/viewform?c=0&w=1>Task updates and questions will be posted to our public mailing list:http://groups.google.com/group/verbalmwe <http://groups.google.com/group/verbalmwe>More details on the annotated corpora can be found here:https://typo.uni-konstanz.de/parseme/index.php/2-general/202-parseme-shared-task-on-automatic-identification-of-verbal-mwes-edition-1-1 <https://typo.uni-konstanz.de/parseme/index.php/2-general/202-parseme-shared-task-on-automatic-identification-of-verbal-mwes-edition-1-1>The annotation guidelines used in manual annotation of the training and test sets are available here:http://parsemefr.lif.univ-mrs.fr/parseme-st-guidelines/1.1 <http://parsemefr.lif.univ-mrs.fr/parseme-st-guidelines/1.1>Publication and workshop------------------------Shared task participants will be invited to submit a system description paper to a special track of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018) at COLING 2018, to be held on August 25-26, 2018, in Santa Fe, New Mexico, USA: http://multiword.sourceforge.net/lawmwecxg2018 <http://multiword.sourceforge.net/lawmwecxg2018>Submitted system description papers must follow the workshop submission instructions and will go through double-blind peer reviewing by other participants and selected LAW-MWE-CxG-2018 program committee members. Their acceptance depends on the quality of the paper rather than on the results obtained in the shared task. Authors of the accepted papers will present their work as posters/demos in a dedicated session of the workshop, collocated with COLING 2018 The submission of a system description paper is not mandatory.Due to double blind review, participants are asked to provide a nickname (i.e. a name that does not identify authors, universities, research groups etc.) for their systems when submitting results and in the submitted papers.Provided data-------------For each language, we provide to the participants corpora in which VMWEs are annotated according to universal guidelines:* Manually annotated **training corpora** made available to the participants in advance, in order to allow them to train their systems.* Manually annotated **development corpora** also made available in advance so as to tune/optimize the systems' parameters.* Raw (unannotated) **test corpora** to be used as input to the systems during the evaluation phase. The VMWE annotations in this corpus will be kept secret.The training and development sets are available at:https://gitlab.com/parseme/sharedtask-data/tree/master/1.1 <https://gitlab.com/parseme/sharedtask-data/tree/master/1.1>When available, morphosyntactic data (parts of speech, lemmas, morphological features and/or syntactic dependencies) are also provided. Depending on the language, the information comes from treebanks (e.g., Universal Dependencies) or from automatic parsers trained on treebanks (e.g., UDPipe).We have prepared corpora for the following languages: Arabic (AR), Bulgarian (BG), German (DE), Greek (EL), English (EN), Spanish (ES), Basque (EU), Farsi (FA), French (FR), Hebrew (HE), Hindi (HI), Croatian (HR), Hungarian (HU), Italian (IT), Lithuanian (LT), Polish (PL), Brazilian Portuguese (PT), Romanian (RO), Slovene (SL), Turkish (TR).The amount of annotated data depends on the language.Tracks------System results can be submitted in two tracks: * **Closed track**: Systems using only the provided training data - VMWE annotations + morpho-syntactic data (if any) - to learn VMWE identification models and/or rules. * **Open track**: Systems using or not the provided training data, plus any additional resources deemed useful (MWE lexicons, symbolic grammars, wordnets, raw corpora, word embeddings, language models trained on external data, etc.). This track includes notably purely symbolic and rule-based systems.Teams submitting systems in the open track will be requested to describe and provide references to all resources used at submission time. Teams are encouraged to favor freely available resources for better reproducibility of their results.Teams can submit their results in an archive file at this link:https://www.softconf.com/coling2018/ws-LAW-MWE-CxG-2018 <https://www.softconf.com/coling2018/ws-LAW-MWE-CxG-2018>Each team can upload 2 submissions per track, i.e. 4 altogether.Evaluation metrics------------------Participants will provide the output produced by their systems on the test corpus. This output will be compared with the gold standard (ground truth).Further details on the evaluation metrics can be found here:http://multiword.sourceforge.net/PHITE.php?sitesig=CONF&page=CONF_04_LAW-MWE-CxG_2018&subpage=CONF_50_Evaluation_metrics <http://multiword.sourceforge.net/PHITE.php?sitesig=CONF&page=CONF_04_LAW-MWE-CxG_2018&subpage=CONF_50_Evaluation_metrics>Important dates----------------- * April 4, 2018: shared task training data released * April 30, 2018: shared task blind test data released * May 8, 2018: submission of system results (EXTENDED!) * May 11, 2018: announcement of results * May 25, 2018: submission of system description papers * June 20, 2018: notification * June 30, 2018: camera-ready papers * August 25-26, 2018: shared task workshop colocated with LAW-MWE-CxG-2018Organizing team---------------Silvio Ricardo Cordeiro, Carlos Ramisch, Agata Savary, Veronika VinczeContact: parseme-st-core at nlp.ipipan.waw.pl <parseme-st-core at nlp.ipipan.waw.pl>* -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 24766 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20180503/45be59aa/attachment.txt>

More information about the Corpora mailing list