[Corpora-List] Release of the PARSEME Corpus of Verbal Multiword Expressions

Agata Savary agata.savary at univ-tours.fr
Wed Jun 28 23:35:00 CEST 2017


Dear all,

We are pleased to announce the release of thePARSEME Corpus of Verbal Multiword Expressions <http://hdl.handle.net/11372/LRT-2282>:

http://hdl.handle.net/11372/LRT-2282

It is the outcome of a considerable collective effort made within the PARSEME network by 18 language teams. The corpus contains texts in 18 languages manually annotated with verbal multiword expressions using universal guidelines. It was used as training and test data in the PARSEMEshared task <http://multiword.sourceforge.net/sharedtask2017>on automatic identification of verbal multiword expressions. The corpus is freely available under various flavours of Creative Commons licences.

Enjoy!

PARSEME shared task organizers

------------------------------------------------

Resource name:

PARSEME Corpus of Verbal Multiword Expressions (version 1.0)

Type of resource:

manually annotated corpus and the associated tools

Languages:

Bulgarian, Czech, Farsi, French, German, Modern Greek, Hebrew, Hungarian, Italian, Lithuanian, Maltese, Polish, Romanian, Brazilian Portuguese, Slovenian, Spanish, Swedish, and Turkish

Size:

275 thousand sentences, 5.5 million tokens, 54 thousand annotated VMWEs

see alsoper-language statistics <http://multiword.sourceforge.net/PHITE.php?sitesig=CONF&page=CONF_05_MWE_2017___lb__EACL__rb__&subpage=CONF_40_Shared_Task#data>

Format:

parsemetsv <https://typo.uni-konstanz.de/parseme/index.php/2-general/184-parseme-shared-task-format-of-the-final-annotation>, inspired by theCoNLL-U <http://universaldependencies.org/format.html>format

Annotation schema:

follows the universalannotation guidelines <http://parsemefr.lif.univ-mrs.fr/parseme-st-guidelines/1.0/>, elaborated for 21 languages

Features:

for most languages, aligned companion files with morphological and/or syntactic data in theCoNLL-U <http://universaldependencies.org/format.html>format are available

License:

various flavours of the Creative Commons license, see license per language <https://lindat.mff.cuni.cz/repository/xmlui/page/licence-mwe-1.0>

Download link:

http://hdl.handle.net/11372/LRT-2282

Publisher:

PARSEME network

Authors:

Agata Savary (France), Carlos Ramisch (France), Silvio Ricardo Cordeiro (France, Brazil), Federico Sangati (Italy), Veronika Vincze (Hungary), Behrang QasemiZadeh (Germany), Marie Candito (France), Fabienne Cap (Sweden), Voula Giouli (Greece), Ivelina Stoyanova (Bulgaria), Antoine Doucet (France),

Kübra Adalı (Turkey), Verginica Barbu Mititelu (Romania), Eduard Bejček (Czech Republic), Ismail El Maarouf (UK), Gülşen Eryiğit (Turkey), Luke Galea (Malta), Yaakov Ha-Cohen Kerner (Israel), Jolanta Kovalevskaitė (Lithuania), Simon Krek (Slovenia), Chaya Liebeskind (Israel), Johanna Monti (Italy), Carla Parra Escartín (Spain), Lonneke van der Plas (Malta), Cristina Aceta (Spain), Itziar Aduriz (Spain), Jean-Yves Antoine (France), Greta Attard (Malta), Kirsty Azzopardi (Malta), Loic Boizou (Lithuania), Janice Bonnici (Malta), Mert Boz (Turkey), Ieva Bumbulienė (Lithuania), Jael Busuttil (Malta), Valeria Caruso (Italy), Manuela Cherchi (Italy), Matthieu Constant (France), Monika Czerepowicka (Poland), Anna De Santis (Italy), Tsvetana Dimitrova (Bulgaria), Tutkum Dinç (Turkey), Hevi Elyovich (Israel), Ray Fabri (Malta), Alison Farrugia (Malta), Jamie Findlay (UK), Aggeliki Fotopoulou (Greece), Vassiliki Foufi (Greece), Sara Anne Galea (Malta), Polona Gantar (Slovenia), Albert Gatt (Malta), Anabelle Gatt (Malta), Carlos Herrero (Spain), Uxoa Ińurrieta (Spain), Glorianna Jagfeld (Germany), Milena Hnátková (Czech Republic), Mihaela Ionescu (Romania), Natalia Klyueva (Czech Republic), Svetla Koeva (Bulgaria), Viktória Kovács (Hungary), Taja Kuzman (Slovenia), Svetlozara Leseva (Bulgaria), Sevi Louisou (Greece), Teresa Lynn (UK), Ruth Malka (Israel), Héctor Martínez Alonso (Spain), John McCrae (UK), Helena de Medeiros Caseli (Brazil), Ayşenur Miral (Turkey), Amanda Muscat (Malta), Joakim Nivre (Sweden), Michael Oakes (UK), Mihaela Onofrei (Romania), Yannick Parmentier (France), Caroline Pasquer (France), Maria Pia di Buono (Italy), Belem Priego Sanchez (Spain), Annalisa Raffone (Italy), Renata Ramisch (Brazil), Erika Rimkutė (Lithuania), Monica-Mihaela Rizea (Romania), Katalin Simkó (Hungary), Michael Spagnol (Malta), Valentina Stefanova (Bulgaria), Sara Stymne (Sweden), Umut Sulubacak (Turkey), Nicole Tabone (Malta), Marc Tanti (Malta), Maria Todorova (Bulgaria), Zdenka Urešová (Czech Republic), Aline Villavicencio (Brazil), Leonardo Zilio (Brazil)

------------------------------------------------

-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 23162 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20170628/9ddd683a/attachment.txt>



More information about the Corpora mailing list