We are pleased to announce the release of *EuroSense*, a multilingual sense-annotated corpus automatically built via the joint disambiguation of *Europarl* <http://opus.lingfil.uu.se/Europarl.php> in *21 languages*, with almost *123 million sense annotations* for over *155,000 distinct concepts and entities*, drawn from the multilingual sense inventory of BabelNet <http://babelnet.org/>.

EuroSense's disambiguation pipeline couples a state-of-the-art graph-based multilingual disambiguation and entity linking system (Babelfy <http://babelfy.org/>) with a language-independent vector representation of concepts and entities (NASARI <http://lcl.uniroma1.it/nasari>). The pipeline is designed to exploit at best the cross-language complementarities of the parallel corpus.

We *release* two different versions of the corpus, both stored in XML files. The first version (“high-coverage”) has been fully disambiguated for all content words and named entities with an estimated precision around 75% for most languages (above 80% for English). The second version (“high-precision”) has a reduced coverage (around 57%) but a higher precision (estimated above 80% for most languages - with peaks above 85%).

EuroSense is *available for download* at http://lcl.uniroma1.it/eurosense

*Reference paper:*

Claudio Delli Bovi, Jose Camacho-Collados, Alessandro Raganato and Roberto Navigli. EuroSense: Automatic Harvesting of Multilingual Sense Annotations from Parallel Text. <http://lcl.uniroma1.it/eurosense/papers/ACL17.pdf> In Proceedings of ACL 2017 (short), Vancouver, Canada, 2017.


Claudio Delli Bovi, Jose Camacho Collados, Alessandro Raganato and Roberto Navigli. Linguistic Computing Laboratory, Sapienza University of Rome

