[Corpora-List] Data collections available: Robust WSD-CLIR task at CLEF2008

Eneko Agirre e.agirre at ehu.es
Fri Feb 13 13:04:10 CET 2009

Release of data collections of the Robust WSD CLIR at CLEF2008 exercise

Word Sense Disambiguation

for (Cross-Lingual) Information Retrieval


The CLEF 2008 robust task brought semantic and retrieval evaluation together. The participants were offered with topics and document collections from previous CLEF campaigns which were annotated by systems for word sense disambiguation (WSD). The goal of the task was to test whether WSD could be used beneficially for retrieval systems, with some positive results (see the working notes at

As a preparation for the 2009 Robust task (to be announced soon) we have compiled all the necessary data to replicate the 2008 experiments, including topics, relevance judgements, and an unordered version of the LA94 and GH95 document collections with WSD data.

The WSD informatoin is based on WordNet version 1.6 and was supplemented with data from the English and Spanish WordNets in order to test different expansion strategies. Several leading WSD experts run their systems, and provided those WSD results for the participants to use.

The robust task used two languages often used in previous CLEF campaigns (English, Spanish). Documents are in English, and topics in both English and Spanish, we thus had both monolingual and cross-lingua Information Retrieval.

For more details please visit http://ixa2.si.ehu.es/clirwsd

For information on future developments, please join the mailing list:


More information about the Corpora mailing list