SEW can be used both as a large-scale Wikipedia-based semantic network and as a sense-tagged dataset with more than *200 million* annotations of over *4 million *different concepts and named entities.
We release two different versions of the corpus, both created from the Wikipedia dump of November 2014, and stored in easy-to-process XML files:
- A "complete" version, with every discovered annotations (including duplicates and overlapping mentions);
- A "conservative" version, with only one sense annotation per tagged mention and no overlap.
We also release two *vector representations* constructed using SEW and used in the extrinsic evaluation of the corpus:
- *WB-SEW*, a vector representation for BabelNet synsets in which dimensions are Wikipedia pages;
- *SB-SEW*, a vector representation for Wikipedia pages in which dimensions are BabelNet synsets.
Please find all the above resources freely available for download at http://lcl.uniroma1.it/sew
*Reference paper (to appear):*
Alessandro Raganato, Claudio Delli Bovi and Roberto Navigli.
*Automatic Construction and Evaluation of a Large Semantically Enriched Wikipedia.*
Proceedings of 25th International Joint Conference on Artificial Intelligence (IJCAI-16), New York City, New York, USA, 9-15 July 2016.
Alessandro Raganato, Claudio Delli Bovi, and Roberto Navigli.
Linguistic Computing Laboratory, Sapienza University of Rome
-- ===================================== Alessandro Raganato Dipartimento di Informatica Sapienza University of Rome Viale Regina Elena 295 00161 Roma Italy Home Page: http://wwwusers.di.uniroma1.it/~raganato ===================================== -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 6691 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20160704/3d164eaf/attachment.txt>