*Multilingual All-Words Sense Disambiguation and Entity Linking*
SemEval-2015 task 13
The automatic understanding of the meaning of text has been a major goal of research in computational linguistics and related areas for several decades, with ambitious challenges, such as Machine Reading (Etzioni, 2006) and the quest for knowledge (Schubert, 2006). Two key Natural Language Processing tasks that need to be tackled as steps towards achieving the goal of automatic understanding of text are Word Sense Disambiguation (WSD) and Entity Linking (EL). WSD (Navigli, 2009) is a historical task aimed at explicitly assigning meanings to single-word and multi-word occurrences within text, a task which is today more alive than ever in the research community. EL (Erbs et al., 2011; Cornolti et al., 2013; Rao et al., 2013) is a more recent task which aims at discovering mentions of entities within a text and linking them to the most suitable entry in a knowledge base. The two main differences between WSD and EL lie in the kind of inventory used, i.e., dictionary vs. encyclopedia, and the assumption that the mention is complete or potentially partial, respectively. For instance, a named entity such as “European Medicines Agency” may be referred to within a text as simply “Medicines Agency”, the meaning of which, however, can be inferred thanks to the context. Notwithstanding these differences, the tasks are pretty similar in nature, in that they both involve the disambiguation of textual fragments according to a reference inventory. However, the research community has hitherto tended to tackle the two tasks separately, often duplicating efforts and solutions.
In contrast to this trend, research in knowledge acquisition is heading towards the seamless integration of encyclopedic and lexicographic knowledge within structured language resources (Hovy et al., 2013), and the main representative of this new direction is undoubtedly BabelNet http://babelnet.org (Navigli and Ponzetto, 2012). Therefore these resources seem to provide a common ground for the two tasks of WSD and EL. Only very recently a joint approach, called Babelfy (http://babelfy.org), has been proposed for both the tasks of WSD and EL (Moro et al., 2014).
In this task, our goal is to promote research in the direction of joint word sense and named entity disambiguation, so as to focus research efforts on the aspects that differentiate these two tasks without duplicating research for common problems within the two tasks. However, we will also allow systems that perform only one of the two tasks to participate, and even systems which tackle one particular setting of WSD, such as all-words sense disambiguation or on any subset of part-of-speech tags. Moreover, given the recent upsurge of interest in multilingual approaches, we will release our dataset in three different languages (English, Italian, Spanish) on parallel corpora which will be independently and manually annotated by different native/fluent speakers. In contrast to the SemEval-2013 task 12, Multilingual Word Sense Disambiguation (Navigli et al., 2013), our focus in this task is to present a dataset focused on both kinds of inventories (i.e., named entities and word senses) in the specific domain of biomedicine, in the attempt to further mitigate the distance between research efforts regarding the dichotomy EL vs. WSD and those regarding the dichotomy open domain vs. closed domain (i.e., biomedical Information Extraction). For this reason we encourage submissions from all these lines of research, in order that we can evaluate the distance between approaches that exploit both kinds of knowledge (i.e., lexicographic and encyclopedic) and approaches that work on both kinds of domain granularity (i.e., open and closed).
*Word Senses and Named Entities inventory*
The evaluation will use BabelNet 2.5, available at http://babelnet.org/ which contains WordNet, Wikipedia, Wiktionary, OmegaWiki, Wikidata and the Open MultilingualWordNet. *Important Dates*
- Trial data ready: May 30, 2014
- Training data ready: July 30, 2014 (there will be no training data)
- Evaluation period starts: December 5, 2014
- Evaluation period ends: December 20, 2014
- Paper submission due: January 30, 2015
- Paper reviews due: February 28, 2015
- Camera ready due: March 30, 2015
- SemEval workshop: Summer 2015
- Andrea Moro <http://wwwusers.di.uniroma1.it/~moro/>,* Sapienza
University of Rome*;
- Roberto Navigli <http://wwwusers.di.uniroma1.it/~navigli/>, *Sapienza
University of Rome*.
Please register to the following Google group:
"SemEval-2015 Task 13: Multilingual all-words WSD and EL <https://groups.google.com/forum/?hl=en#%21forum/semeval-2015-task-13>"
Marco Cornolti, Paolo Ferragina, and Massimiliano Ciaramita. 2013. A framework for benchmarking entity-annotation systems. In Proc. of WWW, pages 249–260.
Nicolai Erbs, Torsten Zesch, and Iryna Gurevych. 2011. Link discovery: A comprehensive analysis. In Proc. of ICSC, pages 83–86.
Oren Etzioni, Michele Banko, and Michael J Cafarella. 2006. Machine Reading. In Proc. of AAAI, pages 1517–1519.
Eduard H. Hovy, Roberto Navigli, and Simone P. Ponzetto. 2013. Collaboratively built semi-structured content and Artiﬁcial Intelligence: The story so far. Artiﬁcial Intelligence, 194:2–27.
Andrea Moro, Alessandro Raganato, and Roberto Navigli. 2014. Entity Linking meets Word Sense Disambiguation: a Unified Approach. Transactions of the Association for Computational Linguistics, 2, pages 231−244.
Roberto Navigli. 2009. Word Sense Disambiguation: A survey. ACM Computing Surveys, 41(2):1–69.
Roberto Navigli, David Jurgens, and Daniele Vannella. 2013. SemEval-2013 Task 12: Multilingual Word Sense Disambiguation. In Proc. of SemEval-2013, pages 222–231.
Roberto Navigli and Simone Paolo Ponzetto. 2012. BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network. Artificial Intelligence, 193:217–250.
Delip Rao, Paul McNamee, and Mark Dredze. 2013. Entity Linking: Finding Extracted Entities in a Knowledge Base. In Multi-source, Multilingual Information Extraction and Summarization, Theory and Applications of Natural Language Processing, pages 93–115. Springer Berlin Heidelberg.
Lenhart K. Schubert. 2006. Turing’s dream and the knowledge challenge. In Proc. of NCAI, pages 1534–1538. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 12427 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20141028/8faf06c0/attachment.txt>