Call for Participation
SemEval-2010 Shared Task #8:
Multi-Way Classification of Semantic Relations
Between Pairs of Nominals
--- Trial data available ---
This shared task should be of interest to researchers working on
* semantic relation extraction
* information extraction
* lexical semantics
Recently, the NLP community has shown a renewed interest in deeper semantic analysis, including automatic recognition of semantic relations between pairs of words. This is an important task with many potential applications in Information Retrieval, Information Extraction, Text Summarization, Machine Translation, Question Answering, Paraphrasing, Recognizing Textual Entailment, Thesaurus Construction, Semantic Network Construction, Word Sense Disambiguation, and Language Modelling.
Despite this interest, progress was slow due to the incompatibility of the different classification schemes proposed and used, which made it difficult to compare the various classification algorithms. Most of the datasets used so far provided no context for the target relation, thus relying on the assumption that semantic relations are largely context-independent, which is not a realistic assumption. A notable exception is SemEval-2007 Task 4: Classification of Semantic Relations between Nominals, which for the first time provided a standard benchmark dataset for seven semantic relations *in context*. However, in order to avoid the challenge of defining a single unified standard classification scheme, this dataset treated each semantic relation separately, as a single two-class (positive vs. negative) classification task, rather than as multi-way classification. While some subsequent publications tried to use the dataset in a multi-way setup, it was not designed to be used in that manner.
We believe that having a freely available standard benchmark dataset for *multi-way* semantic relation classification *in context* is much needed for the overall advancement of the field. Thus, we have posed as our primary objective the challenging task of preparing and releasing such a dataset to the research community. We further set up a common evaluation task that will enable researchers to compare their algorithms.
The Task ==========
Task: Given a sentence and two annotated nominals, choose the most suitable relation from the following inventory of nine relations:
* Relation 1 (Cause-Effect)
* Relation 2 (Instrument-Agency)
* Relation 3 (Product-Producer)
* Relation 4 (Content-Container)
* Relation 5 (Entity-Origin)
* Relation 6 (Entity-Destination)
* Relation 7 (Component-Whole)
* Relation 8 (Member-Collection)
* Relation 9 (Message-Topic)
It is also possible to choose Other if none of the nine relations appears to be suitable.
Example: The best choice for the following sentence would be Component-Whole(e1,e2):
"The <e1>macadamia nuts</e1> in the <e2>cake</e2> also make it necessary to have a very sharp knife to cut through the cake neatly."
Note that in the above sentence, Component-Whole(e1,e2) holds, but Component-Whole(e2,e1) does not, i.e., we have Other(e2,e1). Thus, the task asks for determining *both* the relation and the order of e1 and e2 as its arguments.
* Trial Dataset: A trial dataset has been released on August 30, 2009; it contains data for the first five above-mentioned relations. However, there are some references to the other four relations, which can be considered as Other when experimenting with the trial dataset.
* Training Dataset: The training dataset consists of about 700 examples for each of the nine relations and for the additional Other relation; a total of about 7,000 examples.
* Development Dataset: The development dataset consists of about 100 examples for each of the nine relations and for the additional Other relation; a total of about 1,000 examples.
* Test Dataset: The test dataset contains about 200 examples for each of the nine relations and for the additional Other relation; a total of about 2,000 examples.
License: All data are released under the Creative Commons Attribution 3.0 Unported license.
Time Schedule ===============
* Trial data released: August 30, 2009
* Training+development data release: February 26, 2010 * Test data release: March 18, 2010 * Result submission deadline: 7 days after downloading the *test* data, but no later than April 2
* Organizers send the test results: April 10, 2010 * Submission of description papers: April 17, 2010 * Notification of acceptance: May 6, 2010 * SemEval'2010 workshop (at ACL): July 15-16, 2010
Task Organizers =================
Iris Hendrickx University of Lisbon, University of Antwerp Su Nam Kim University of Melbourne Zornitsa Kozareva University of Southern California, Information Sciences Institute Preslav Nakov National University of Singapore Diarmuid Ó Séaghdha University of Cambridge Sebastian Padó Stuttgart University Marco Pennacchiotti Saarland University, Yahoo! Research Lorenza Romano FBK-irst, Italy Stan Szpakowicz University of Ottawa
Useful Links ==============
Interested in participating in the shared task? Please join the following Google group: http://groups.google.com.sg/group/semeval-2010-multi-way-classification-of-s emantic-relations?hl=en
Task #8 website: http://docs.google.com/View?docid=dfvxd49s_36c28v9pmw
SemEval 2010 website: http://semeval2.fbk.eu/semeval2.php