We release two different versions of the corpus, both stored in easy-to-process XML files divided by language and resource. The first version (“complete”) has been fully disambiguated for all content words and named entities with an estimated precision above 75% for most languages. The second version (“high-precision”) has a reduced coverage (around 65% for all content words and 75% for noun instances) but a higher precision (estimated above 90%).
All the resources are freely available for download at http://lcl.uniroma1.it/disambiguated-glosses
Josť Camacho-Collados, Claudio Delli Bovi, Alessandro Raganato and Roberto Navigli. A Large-Scale Multilingual Disambiguation of Glosses. In Proceedings of LREC 2016 (to appear), Portorož, Slovenia, 23-28 May 2016.
Josť Camacho Collados, Claudio Delli Bovi, Alessandro Raganato and Roberto Navigli.
Linguistic Computing Laboratory, Sapienza University of Rome -- Josť Camacho Collados Linguistic Computing Laboratory (LCL) Sapienza University of Rome http://wwwusers.di.uniroma1.it/~collados/ -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 8674 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20160411/a65536f3/attachment.txt>