[Corpora-List] [Release] EuroSense: Multilingual Sense Annotations for Europarl

Claudio Delli Bovi dellibovi at di.uniroma1.it
Tue May 9 11:12:40 CEST 2017

We are pleased to announce the release of *EuroSense*, a multilingual sense-annotated corpus automatically built via the joint disambiguation of *Europarl* <http://opus.lingfil.uu.se/Europarl.php> in *21 languages*, with almost *123 million sense annotations* for over *155,000 distinct concepts and entities*, drawn from the multilingual sense inventory of BabelNet <http://babelnet.org/>.

EuroSense's disambiguation pipeline couples a state-of-the-art graph-based multilingual disambiguation and entity linking system (Babelfy <http://babelfy.org/>) with a language-independent vector representation of concepts and entities (NASARI <http://lcl.uniroma1.it/nasari>). The pipeline is designed to exploit at best the cross-language complementarities of the parallel corpus.

We *release* two different versions of the corpus, both stored in XML files. The first version (“high-coverage”) has been fully disambiguated for all content words and named entities with an estimated precision around 75% for most languages (above 80% for English). The second version (“high-precision”) has a reduced coverage (around 57%) but a higher precision (estimated above 80% for most languages - with peaks above 85%).

EuroSense is *available for download* at http://lcl.uniroma1.it/eurosense

*Reference paper:*

Claudio Delli Bovi, Jose Camacho-Collados, Alessandro Raganato and Roberto Navigli. EuroSense: Automatic Harvesting of Multilingual Sense Annotations from Parallel Text. <http://lcl.uniroma1.it/eurosense/papers/ACL17.pdf> In Proceedings of ACL 2017 (short), Vancouver, Canada, 2017.


Claudio Delli Bovi, Jose Camacho Collados, Alessandro Raganato and Roberto Navigli. Linguistic Computing Laboratory, Sapienza University of Rome

-- CDB

===================================== Claudio Delli Bovi Dipartimento di Informatica Sapienza University of Rome Viale Regina Elena 295 00161 Roma Italy Home Page: http://wwwusers.di.uniroma1.it/~dellibovi ===================================== -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 3216 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20170509/deebc519/attachment.txt>

More information about the Corpora mailing list