EuroSense's disambiguation pipeline couples a state-of-the-art graph-based multilingual disambiguation and entity linking system (Babelfy <http://babelfy.org/>) with a language-independent vector representation of concepts and entities (NASARI <http://lcl.uniroma1.it/nasari>). The pipeline is designed to exploit at best the cross-language complementarities of the parallel corpus.
We *release* two different versions of the corpus, both stored in XML files. The first version (“high-coverage”) has been fully disambiguated for all content words and named entities with an estimated precision around 75% for most languages (above 80% for English). The second version (“high-precision”) has a reduced coverage (around 57%) but a higher precision (estimated above 80% for most languages - with peaks above 85%).
EuroSense is *available for download* at http://lcl.uniroma1.it/eurosense
*Reference paper:*
Claudio Delli Bovi, Jose Camacho-Collados, Alessandro Raganato and Roberto Navigli. EuroSense: Automatic Harvesting of Multilingual Sense Annotations from Parallel Text. <http://lcl.uniroma1.it/eurosense/papers/ACL17.pdf> In Proceedings of ACL 2017 (short), Vancouver, Canada, 2017.
Regards,
Claudio Delli Bovi, Jose Camacho Collados, Alessandro Raganato and Roberto Navigli. Linguistic Computing Laboratory, Sapienza University of Rome
-- CDB
===================================== Claudio Delli Bovi Dipartimento di Informatica Sapienza University of Rome Viale Regina Elena 295 00161 Roma Italy Home Page: http://wwwusers.di.uniroma1.it/~dellibovi ===================================== -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 3216 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20170509/deebc519/attachment.txt>