[Corpora-List] Multilingual Event Extraction Corpus

Hristo Tanev htanev at yahoo.co.uk
Tue Apr 12 22:13:28 CEST 2016


Multilingual Corpus of Events We are happy to present MEVEX - a multilingual corpus of news annotated with event metadata. The corpus has been created at the NLP group of University of West Bohemia in collaboration with the Joint Reseacrch Centre, EC.

The events in the corpus are from the domain of violence, natural and man made disasters.  The event annotation follows an event taxonomy, which is published together with the corpus. Currently, MEVEX encompasses 109 topics. Each topic contains comparable articles in different languages taken from Wikinews.  In total, there are 342 articles from 14 languages, with the best coverage of Czech and English.

Possible usages of this corpus include (but are not limited to): - Training a multilingual lexicon of patterns for detection of dead, injured, kidnapped, and perpertrators - Learning of statistical models for detection of event classes. For example, learning a classical lexical model using standard ML methods like SVM or using the articles in the corpus and applying the K nearest neighbour for classification of unknown articles. - Multilingual evaluation of event extraction systems

The corpus and the event taxonomy can be downloaded at http://nlp.kiv.zcu.cz/projects/mevex

-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 4279 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20160412/e2dc0a5c/attachment.txt>



More information about the Corpora mailing list