[Corpora-List] Call for participation: Robust WSD-CLIR task at CLEF2009

Eneko Agirre e.agirre at ehu.es
Tue Feb 17 15:35:42 CET 2009


Apologies for cross-postings

Call for participation

Robust WSD CLIR at CLEF2009

Word Sense Disambiguation

for (Cross-Lingual) Information Retrieval

http://ixa2.si.ehu.es/clirwsd

Following the success of the 2007 joint SemEval-CLEF task and the 2008 Robust WSD task at CLEF, a follow-up task will be hold in 2009 with the aim of exploring the contribution of Word Sense Disambiguation to monolingual and multilingual Information Retrieval. The 2009 exercise will be very similar to the 2008 one. Those interested in exploring the 2008 data can check it here: http://ixa2.si.ehu.es/clirwsd/index.php?option=com_content&task=view&id=19&Itemid=35

The robust task will bring semantic and retrieval evaluation together. The participants will be offered topics and document collections from previous CLEF campaigns which were annotated by systems for word sense disambiguation (WSD). The goal of the task is to test whether WSD can be used beneficially for retrieval systems.

The organizers believe that polysemy is among the reasons for information retrieval (IR) systems to fail. WSD could allow a more targeted retrieval. Robust-WSD at CLEF 2008 showed that some top-scoring systems improved their IR and CLIR results with the use of WSD tags. See working notes: http://www.clef-campaign.org/2008/working_notes/adhoc-final.pdf

The WSD data is based on WordNet version 1.6 and will be supplemented with data from the English and Spanish WordNets in order to test different expansion strategies. Several leading WSD experts will run their systems, and provide those WSD results for the participants to use.

Participants are required to submit at least one baseline run without WSD and one run using the WSD data. They can submit four further baseline runs without WSD and four runs using WSD with in various ways.

The robust task will use two languages often used in previous CLEF campaigns (English, Spanish). Documents will be in English, and topics in both English and Spanish.

The evaluation will be based on Mean Average Precision (MAP) as well as Geometric Average Precision (GMAP). The robust measure GMAP intends to evaluate stable performance over all topics instead of high average performance in Mono- and Cross-Language IR ("ensure that all topics obtain minimum effectiveness level" Voorhees 2005 SIGIR Forum).

Time Schedule:

Registration Opens - 1 February 2009 (closes on 1 May)

Data Release - from 15 March 2009

Topic Release - 24 April 2009

Submission of Runs by Participants - 1 June 2009

Release of Relevance Assessments and Individual Results - from 26 June 2009

Submission of Paper for Working Notes - around August 2009 (to be announced)

Workshop - 30 September to 2 October 2009 (collocated with ECDL 2009)

Contact

Thomas Mandl, University of Hildesheim, mandl at uni-hildesheim de

Eneko Agirre, University of the Basque Country, e.agirre at ehu es

For more details please visit

http://ixa2.si.ehu.es/clirwsd

http://www.clef-campaign.org

Please join the mailing list:

http://groups.google.com/group/clirwsd



More information about the Corpora mailing list