[Corpora-List] CLEF 2012 Labs - First Call for Participation

Pamela Forner forner at celct.it
Fri Jan 20 15:25:34 CET 2012

**Apologies if you receive multiple copies. Please, distribute it among potentially interested colleagues**

CLEF 2012 Conference and Labs of the Evaluation Forum Information Access Evaluation meets Multilinguality, Multimodality, and Visual Analytics


CLEF 2012 Labs - First Call for Participation -

The CLEF 2012 is next year’s edition of the popular CLEF campaign and workshop series (http://www.clef-initiative.eu//) which has run since 2000 contributing to the systematic evaluation of information access systems, primarily through experimentation on shared tasks. In 2010 CLEF was launched in a new format, as a conference with research presentations, panels, poster and demo sessions and laboratory evaluation workshops. Labs follow under two types: laboratories to conduct evaluation of information access systems, and workshops to discuss and pilot innovative evaluation activities. In 2012, CLEF will take place in September 17-20 in Rome, and researchers and practitioners from all segments of the information access and related communities are invited to participate to the following Evaluation Labs:

CHiC - Cultural Heritage in CLEF The CHiC 2012 pilot evaluation lab aims at moving towards a systematic and large-scale evaluation of cultural heritage digital libraries and information access systems. Data test collections and queries will come from the cultural heritage domain (in 2012 data from Europeana) and tasks will contain a mix of conventional system-oriented evaluation scenarios (e.g. ad-hoc retrieval and semantic enrichment) for comparison with other domains and a uniquely customized scenario for the CH domain, i.e. a variability task to present a particular good overview (“must sees”) over the different object types and categories in the collection targeted towards a casual user. Lab Coordinators: Berlin School of Library and Information Science, Humboldt-Universität zu Berlin (DE); Department of Information Engineering, U. of Padova (IT); Royal School of Library and Information Science, Copenhagen (DE); The Information School, U. of Sheffield (UK); Europeana, The Hague, Netherlands (NL) Lab Webpage: http://www.promise-noe.eu/chic-2012/home

CLEF-IP : IR in the IP domain The CLEF-IP lab provides a large collection of XML documents representing patents and patent images. On this collection we organize the following four tasks: - Passage Retrieval starting from claims: Starting from a given claim, we ask to retrieve relevant documents in the collection and mark out the relevant passages in these documents. - Matching Claim to description in a single document (Pilot): Starting from the claims of an patent application, we ask to indicate the paragraphs in the application's description section (same document) that best explain the contents of the given claim. - Flowchart Recognition Task: Extract the information in flowchart images and return it in a predefined textual format. - Chemical Structure Recognition Task. Starting from TIFF images containing patent scans, we ask to identify the location of the chemical structures depicted on these pages and, for each of them, return the corresponding structure in a MOL file (a chemical structure file format). Lab Coordinators: Vienna University of Technology (AT), SAIC-Frederick Inc. (US), Fraunhofer SCAI (DE), U. of Birmingham (UK). Lab Webpage: http://ifs.tuwien.ac.at/~clef-ip/

ImageCLEF - Cross Language Image Retrieval This lab evaluates the cross-language annotation and retrieval of images by focusing on the combination of textual and visual evidence. Four challenging tasks are foreseen for ImageCLEF 2012: - Medical image modality classification and medical image retrieval with visual, semantic and mixed topics in several languages, using a data collection from the biomedical literature; - a photo annotation task that investigates automated semantic concept detection and concept-based retrieval using Flickr data, and large-scale annotation using general Web data; - Visual classification of leaf images for the identification of plant species; and - Semantic localisation of a mobile robot using multimodal place classification, with a special focus on generalization. In addition, a practical showcase is planned at the conference for the real-time evaluation of interactive image search systems. Lab coordinators: U. of Applied Sciences Western Switzerland (CH), Harvard Medical School (US), National Library of Medicine (US), Nuance Communications (US), CEA LIST (FR), U. Politècnica de Valencia (ES), Yahoo! Research (ES), IDIAP (CH), INRA-AMAP (FR), and INRIA (FR). Lab website: http://www.imageclef.org/

INEX - INitiative for the Evaluation of XML Retrieval INEX has been pioneering structured retrieval since 2002, and will join forces with CLEF. running five tracks: - Social Book Search Track: studying the value of user-generated descriptions in addition to formal metadata on a collection of Amazon Books and LibraryThing.com data. - Data Centric Track: studying adhoc search and facetted search on a collection of Linked Data (DBpedia) tied to a large corpus (Wikipedia). - Snippet Retrieval Track: studying the generation of informative snippets with sufficient information to determine the relevancy of search results. - Show Me Your Code Track: asking participants to submit system components (in particular feedback) rather than results. - Tweet Contextualization Track: retrieving synthetic contextual information from Wikipedia in response to a tweet with a URL on a small terminal like a phone. Lab Coordinators: Queensland University of Technology (AU), University of Amsterdam (NL), Saarland University/MPI (DE), and the track organizers. Lab Webpage: http://inex.mmci.uni-saarland.de/

PAN - Uncovering Plagiarism, Authorship, and Social Software Misuse PAN offers three tasks: - Plagiarism Detection. This task features a new plagiarism corpus based on the ClueWeb09, the new search engine ChatNoir which indexes the corpus, the cloud-based algorithm evaluation architecture TIRA, and for the first time, real plagiarism cases. At the conference, keynotes about cross-language plagiarism detection will be held by Roberto Navigli (Università La Sapienza), and Ralf Steinberger (European Commission, JRC). - Author Identification. This task focuses on identifying sexual predators in chat logs and on authorship verification. Moreover, it features for the first time real cases of disputed authorship. - Quality Flaw Prediction in Wikipedia. This task is newly introduced, and it is about identifying Wikipedia articles which contain certain information quality flaws. It generalizes the vandalism detection task of last year. Lab Coordinators: Bauhaus-Universität Weimar (DE), U. Politécnica de Valencia (ES), U. of the Aegean (GR), Bar-Ilan University (IL), Illinois Institute of Technology (US), Duquesne University (US), and U. of Lugano (CH). Lab web page: http://pan.webis.de

QA4MRE- Question Answering for Machine Reading Evaluation The goal of QA4MRE is to evaluate Machine Reading abilities through Question Answering and Reading Comprehension Tests.  The task focuses on the reading of single documents and the identification of the answers to a set of questions about information that is stated or implied in the text. Questions are in the form of multiple choice, each having five options, and only one correct answer. The participating systems will be required to answer the questions by choosing in each case one answer from the five alternatives. Systems should be able to use knowledge from given texts which may be used to assist with answering the questions, anyway, the principal answer is to be found among the facts contained in the test documents given. Two additional pilots are also proposed: - Processing Modality and Negation for Machine Reading: aimed at evaluating whether systems are able to understand extra-propositional aspects of meaning like modality and negation. - Machine Reading of Biomedical Texts about Alzheimer: aimed at setting questions in the biomedical domain with a special focus on the Alzheimer disease. Lab Coordinators: UNED (ES), ISI (US), CELCT (IT), University of Limerick (IE), University of Antwerp (BE).  Lab Webpage: http://celct.fbk.eu/QA4MRE/

RepLab 2012 Online Reputation Management deals with the image that online media project about individuals and organizations. The growing relevance of social media and the speed at which facts and opinions travel in microblogging networks make online reputation an essential part of a company's public relations. The aim of RepLab is to bring together the Information Access research community (including researchers and developers) with representatives from the Online Reputation Management industry, with the ultimate goals of (i) establishing a five-year roadmap on the topic that includes a description of the language technologies required in terms of resources, algorithms, and applications; (ii) specifying suitable evaluation methodologies and metrics to measure scientific progress; and (iii) developing of test collections that enable systematic comparison of algorithms and reliable benchmarking of commercial systems. In 2012, RepLab will organize two shared tasks on Twitter data: (i) A monitoring task, where the goal is to thematically cluster tweets including a company's name as a step towards early alerting on issues that may damage the company's reputation; and (ii) a profiling task, where the goal is annotating tweets according to their polarity for reputation (i.e. as to whether their content has positive/negative implications for the company's reputation).


CLEFeHealth 2012 is a one-day workshop on cross-language evaluation of methods, applications, and resources for eHealth document analysis with a focus on written and spoken natural-language processing. We invite research, industry and government representatives to develop with us a roadmap towards the vision of using systematically evaluated ICT tools to analyse and integrate eHealth documents across languages, genres, and jargons. We call for 1-2 page abstracts on: (a) evaluation of mono-and multilingual methods, applications and resources for eHealth document analysis; and (b) development of statistical and user-feedback based evaluation protocols, settings, methods and measures for cross-language evaluation of methods, applications, and resources for eHealth document analysis. We have a double-blind review process, so please note that the submission deadline is May 2012. Lab Coordinators: National ICT Australia (NICTA) Lab Webpage: www.nicta.com.au/clefehealth2012

================================ Pamela Forner CELCT (web: www.celct.it) Center for the Evaluation of Language and Communication Technologies Via alla Cascata 56/c 38100 Povo – TRENTO –Italy

email: forner at celct.it tel.:  +39 0461 314 804 fax:  +39 0461 314 846   Secretary Phone:  +39 0461 314 870

More information about the Corpora mailing list