[Corpora-List] Final Call for Participation: Word Sense Induction and Disambiguation for Graded Senses

David Alan Jurgens jurgens at di.uniroma1.it
Thu Feb 14 20:31:14 CET 2013


Final Call for Participation: Word Sense Induction and Disambiguation for Graded Senses

(SemEval-2013, Task 13)

http://www.cs.york.ac.uk/semeval-2013/task13/

In keeping with the strong tradition of Word Sense Disambiguation at SenseEval and SemEval, we are pleased to invite participants to SemEval-2013 Task 13 on word senses with graded applicability in context.

Previous tasks on word senses have largely assumed that each usage of a word is best labeled by a single sense. In contrast, Task 13 proposes that usages should be labeled by all senses that apply, with weights indicating the degree of applicability. This multi-sense labeling effectively captures both cases where related senses from a fine-grain sense inventory apply and where contextual ambiguity enables alternate interpretations. We illustrate this with three example sentences:

- The student loaded paper into the printer

- The student submitted her paper by email.

- The student handed her paper to the teacher at the beginning of class

according to the first two senses of paper in WordNet 3.1:

1. paper - a material made of cellulose pulp derived mainly from wood or

rags or certain grasses

2. paper - an essay, especially one written as an assignment

The first sentence refers to the material sense of paper, while the second sentence refers to the essay sense of paper. In contrast, both senses are possible interpretations in the third sentence, though with different degrees; here, the usage evokes separate properties of the concept of its form (a cellulose material) and purpose (an assignment), which are themselves distinct senses of paper. Similar multi-label conditions may also be constructed for word uses where a reader perceives multiple, unrelated interpretations due to contextual ambiguity. While most previous work on WSD makes a best guess as to which interpretation is correct, Task 13 opts to make explicit the ambiguity explicit in the multi-sense labeling.

*Task*

Task 13 evaluates Word Sense Induction (WSI) and Unsupervised WSD systems in two settings (1) a WSD task and for sense induction systems, (2) a clustering comparison setting that evaluates the similarity of the sense inventories. Participants are presented examples contexts of each word and asks the participants to label each usage with as many senses as they think are applicable, along with numeric weights denoting the relative levels of applicability. Words will be balanced across part of speech and number of senses, using nouns, verbs and adjectives. In addition, the data set will include several highly polysemous words (15+ senses) for each part of speech. Word senses are drawn from WordNet 3.1.

*Participation*

The focus of this task is on unsupervised systems and therefore we solicit participation for two types of systems. Following previous SemEval tasks on WSI, we solicit systems that first learn the senses themselves and then label the test data using their induced senses. Second, we also solicit Unsupervised WSD systems trained on WordNet 3.1 that will label using the same sense inventory as the test data.

Both systems will be evaluated jointly on the first subtask using a series of graded sense label comparisons. Sense induction systems will also be evaluated using unsupervised clustering measures.

*Data*

Because the task is for unsupervised WSD and WSI systems, no training data is provided. However, for WSI systems, the ukWaC<http://wacky.sslmit.unibo.it/doku.php?id=corpora>corpus has been specified as the official dataset from which senses have been learned. In contrast to past SemEval tasks on WSI, a significantly larger corpus is being used to facilitate all-words WSI methods.

Trial data for the task has been released and includes an example dataset of eight words for the first subtask, along with the evaluation measures.

*Important Dates*

Please note that interested parties should still register even if they decline later to submit a system.

August 7, 2012, Trial Data 1.0 Released November 1, 2012 onwards Start of evaluation period January 10, 2013, Trial Data 1.1 Released *February 15, 2013* Registration Deadline for Task Participants March 15, 2013 End of evaluation period April 9, 2013 Paper submission deadline (to be confirmed)

*Organizers * David Jurgens (lastname at di.uniroma1.it), Sapienza University of Rome, Italy Ioannis Klapaftis (lastname at outlook.com), Microsoft (Bing), United Kingdom

*Contact*

In interested, please join the discussion on our Google Group: https://groups.google.com/group/semeval-2013-task-13. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 5443 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20130214/c836b404/attachment.txt>



More information about the Corpora mailing list