[Corpora-List] CLEF 2013 Labs - Registration is now open

Pamela Forner forner at celct.it
Tue Dec 18 09:08:17 CET 2012

Please distribute widely - Apologies for cross-posting

************************************************************************************************************************************************ The registration to CLEF 2013 Labs is now open.

Please register to the Lab(s) you are interested in, by filling in the form at http://www.clef2013.org/index.php?page=Pages/registrationForm.php


CLEF 2013 - Conference and Labs of the Evaluation Forum Information Access Evaluation meets Multilinguality, Multimodality, and Visualization

23 - 26 September 2013, Valencia - Spain


Call for Labs Participation (Download flyer at http://celct.fbk.eu/clef2013/resources/CFP_Labs_Flyer_2013.pdf)


The CLEF Initiative (Conference and Labs of the Evaluation Forum, formerly known as Cross-Language Evaluation Forum) is a self-organized body whose main mission is to promote research, innovation, and development of information access systems with an emphasis on multilingual and multimodal information with various levels of structure. CLEF 2013 conference is next year's edition of the popular CLEF campaign and workshop series which has run since 2000 contributing to the systematic evaluation of information access systems, primarily through experimentation on shared tasks. In 2010 CLEF was launched in a new format, as a conference with research presentations, panels, poster and demo sessions and laboratory evaluation workshops interleaved during three and a half days of intense and stimulating research activities. Each lab focuses on a particular sub-problem or variant of the retrieval task as described below. Researchers and practitioners from all segments of the information access and related communities are invited to participate, choosing to take part in any or all evaluation labs. Ten labs are offered at CLEF 2013. Nine labs will follow a "campaign-style" evaluation practice for specific information access problems in the tradition of past CLEF campaign tracks:

CHiC - Cultural Heritage in CLEF The CHiC 2013 evaluation lab aims at moving towards a systematic and large-scale evaluation of cultural heritage digital libraries and information access systems. Data test collections and queries will come from the cultural heritage digital library Europeana. Three different tasks are planned: (1) Multilingual ad-hoc and semantic enrichment, assessing IR in a multilingual collection both for ad-hoc IR and query enrichment; (2) Polish ad-hoc, evaluating Polish-language retrieval, and (3) interactive, where the evaluation framework is extended to an interactive study observing users during a non-intentional browsing activity. Participant will receive a fixed research protocol and a browsing interface for Europeana data. The data gathered with survey questions and log file data is aggregated over all participants and will then be used to answer research questions on user behavior and system development. Lab Coordination: Humboldt-Universität zu Berlin, U. of Padova, U. of Sheffield, Royal School of Library and Information Science, SICS, U. of Neuchatel, U. of Wroclaw, Europeana, U. Nicolaus Copernicus Lab Website: http://www.promise-noe.eu/chic-2013/home

CLEFeHealth Discharge summaries describe the course of treatment, the status at release, and care plans. Both nurses and patients are likely to have difficulties in understanding their content, because of their compressed language full of medical jargon, nonstandard abbreviations, and ward-specific idioms. To support the continuum of care, our goal is to develop methods and resources that make discharge documents easier to understand from nurses and patients' perspective and address their differing queries and information needs when searching further details on matters mentioned in the discharge summaries. This could include extending abbreviations, generalising from trade names to more generic descriptions of medicine, attaching further definitions to difficult phrasings, and having user-centric web search engines available. We annotate, experiment, and survey these processing and visualization strategies and select a small number of the strategies for method and resource development. Data for these tasks are in English and originate from the i2b2 NLP Research Data Sets and Khresmoi Medical Information Analysis and Retrieval project. Lab Coordination: U. California, NICTA, U. Turku, U. Stockholm Lab Website: http://nicta.com.au/business/health/events/clefehealth_2013

CLEF-IP - Retrieval in the Intellectual Property Domain The CLEF-IP lab provides a large collection of XML documents representing patents and patent images. On this collection we organize the following tasks: - Passage retrieval starting from claims: starting from a given claim, we ask to retrieve relevant documents in the collection and mark out the relevant passages in these documents; - Image to text, text to image: given a patent application document - as an XML file - and the set of images occurring in the application, extract the links between the image labels and the text pointing to the object of the image label. - Image to structure task: extract the information in patent images (flowcharts, electrical diagrams) and return it in a predefined textual format. Lab Coordination: IFS, Vienna University of Technology, Quatar Foundation Lab Website: http://www.ifs.tuwien.ac.at/~clef-ip/

ImageCLEF 2013 Cross Language Image Annotation and Retrieval This lab evaluates the cross-language annotation and retrieval of images by focusing on the combination of textual, visual and multimodal evidence. Three challenging tasks are foreseen: - Photo Annotation and Retrieval: semantic concept detection using private collection data, and large-scale annotation using general Web data; - Plant Identification: visual classification of leaves, flowers, fruits,  and bark images for the identification of plant species; - Robot Vision: semantic spatial understanding for a mobile robot using multi modal data. Lab Coordination: IDIAP Research Institute, U. of Applied Sciences Western Switzerland, Yahoo! Research, U. Politecnica de Valencia, Brandenburg T. U., INRIA, UMR AMAP, U. of Castilla-La Mancha, U. of Alicante. Lab Website: http://www.imageclef.org/

INEX - INitiative for the Evaluation of XML retrieval INEX builds evaluation benchmarks for search with rich structure - such as document structure, semantic metadata, entities, or genre/topical structure. INEX studies three different aspects of focused information access: - Searching structured or semantic data: The Linked Data Track studies ad-hoc search and faceted search over entities in a strongly structured collection of Linked Data (DBpedia) tied to a large textual corpus (Wikipedia). - Searching professional and user generated data: The Social Book Search Track studies the value of user-generated descriptions in addition to formal metadata on a collection of Amazon Books and LibraryThing.com data. In addition, the track studies the challenges of searching full text of scanned books. - Focused retrieval: First, from the IR perspective, the Snippet Retrieval Track studies how to generate informative snippets for search results. Second, from the NLP perspective, the Tweet Contextualization Track studies tweet contextualization, answering questions of the form "what is this tweet about?" with a synthetic summary of contextual information from Wikipedia and evaluated by both the relevant text retrieved, and the "last point of interest." Lab Coordination: QUT, U. Amsterdam, U. Saarland/MPI Lab Website: http://inex.mmci.uni-saarland.de/

PAN - Uncovering Plagiarism, Authorship, and Social Software Misuse PAN offers three tasks: - Plagiarism Detection: Given a document, is it an original? - Author Identification: Given a document, who wrote it? - Author Profiling: Given a document, what is the author's age/gender? For each of these tasks we have prepared new evaluation resources consisting of large-scale corpora, performance measures, and web services that allow for meaningful evaluations. Our main goal is to provide for sustainable and reproducible evaluations, to get a clear view of the capabilities of state of the art algorithms. Lab Coordination: Bauhaus-Universität Weimar, U. Politècnica de València, U. of the Aegean, Bar-Ilan University, Duquesne University, U. of Lugano, and Autoritas Consulting. Lab Website: http://pan.webis.de

QA4MRE - Question Answering for Machine Reading Evaluation The goal of QA4MRE is to evaluate Machine Reading abilities through Question Answering and Reading Comprehension Tests. The task focuses on the reading of single documents and selection of the answers to a set of questions about information that is stated or implied in the text. While the principal answer is to be found among the facts contained in the test documents provided, systems could use knowledge from additional given texts. Some questions will also test system ability to understand extra propositional aspects of meaning such as modality and negation. Two additional pilots are also proposed: - Machine Reading of Biomedical Texts about Alzheimer's Disease: aimed at answering questions specific to the biomedical domain, with a special focus on the Alzheimer's disease. - Entrance Exams: aiming at answering multiple-choice questions of real English Reading Comprehension tests contained in Japanese University Entrance Exams. Lab Coordination: UNED, CMU, CELCT, U. Limerick, U. Antwerp, NII. Lab Website: http://celct.fbk.eu/QA4MRE/

QALD-3 - Question Answering over Linked Data QALD-3 is the third in a series of evaluation campaigns on question answering over linked data, for the first time with a strong emphasis on multilinguality. Two open challenges are offered: - Question answering: given an RDF dataset and a set of natural language questions of varying complexity and in multiple languages, participating systems are asked to provide correct answers (or SPARQL queries that retrieve those answers). - Ontology lexicalization : focuses on lexica that can facilitate multilingual information access. Participants are asked to find lexicalizations of a set of classes and properties from English DBpedia across languages in a given corpus. Lab Coordination: Bielefeld University, IBM Research, INRIA, University of Leipzig. Lab Website: http://www.sc.cit-ec.uni-bielefeld.de/qald

RepLab 2013 RepLab 2013 is focused on the problem of real-time tracking the reputation of companies/individuals in Twitter. This is called a "monitoring" task, where systems have to cluster tweets mentioning a company in topics, and then have to rank the topics (tweet clusters) by priority. A topic has more priority if it has strong implications for the reputation of the company; and this depends on its polarity for reputation (related, but not identical, to sentiment analysis), on its centrality for the company, on its novelty, on its potential impact, etc. Research groups working on real-time Natural Language Processing, text clustering, sentiment analysis, topic detection and tracking, name disambiguation, etc., are welcome to join RepLab 2013. The organization will provide baseline components for all aspects of the task, so that research groups can test systems that address partial problems (e.g. sentiment analysis). Evaluation results will be provided for the main task (clustering + ranking) and for two subtasks: polarity for reputation and name ambiguity resolution. Lab Coordination: Llorente & Cuenca, UNED, U. of Amsterdam Lab Website: http://www.limosine-project.eu/events/replab2013

One lab will be run as a workshop organized as speaking and discussion session to explore issues of evaluation methodology, metrics, and processes in information access and closely related fields:

CLEF-ER workshop - Entity Recognition The CLEF-ER workshop is executed as part of the CLEF framework and the EC Mantra project. The workshop is set up to address entity recognition in biomedical text, in different languages and at a large scale. Semantic integration is and will be an important focus. The current tasks are motivated by the partner organisations of the EC-funded Mantra project. The workshop will bring together stakeholders from different domains and researchers who take part in the Mantra challenge. The researchers will explore on the evaluation and results of the Mantra challenge from the first half of 2013 and provide input, such as proposals for novel tasks and evaluations, for future challenges. The current Mantra challenge targets the identification of entity mentions and their concept identifiers (CUIs) from a standard terminological resource in multi-lingual texts. To this ends, parallel biomedical corpora have been prepared. These corpora are also exploited to identify entity correspondences and to augment multi-lingual terminologies. Lab Coordination: U. Zürich, Julie Lab / U, of Jena , Erasmus University Medical Center (Rotterdam, Nl) Lab Website: http://www.clefer.org

DATA The training and test data are provided by the organizers, which allow participating systems to be evaluated and compared in a systematic way.

TIMELINE The expected timeline for 2013 Labs is as follows (dates vary slight from task to task, see the individual task pages for the individual deadlines):

December 15 2012 Labs Registration opens December 15 2012 - May 1 2013 Evaluation Campaign June 15 2013 Submission of CLEF 2013 Working Notes June 30 2013 Submission of CLEF 2013 Labs Overviews June 30 - July 7 2013 Review of Labs Overviews September 23-26 2013 CLEF 2013 Conference

WORKSHOPS The lab sessions will take place at the site of the conference in Valencia. The labs will present their overall results "overview presentations" during the plenary scientific paper sessions to allow non-participants to get a sense of where the research frontiers are moving. The workshops will be used as a forum for presentation of results (including failure analyses and system comparisons), description of retrieval techniques used, and other issues of interest to researchers in the field. Some groups will be invited to present their results in a joint poster session.

PUBLICATION All participating institutions to the evaluation labs are asked to submit a paper (Working Notes) which will be published in the online Proceedings. All Working Notes will be published with ISBN, in the conference website.

================================ Pamela Forner CELCT (web: www.celct.it) Center for the Evaluation of Language and Communication Technologies Via alla Cascata 56/c 38100 Povo - TRENTO -Italy

email: forner at celct.it tel.:  +39 0461 314 804 fax:  +39 0461 314 846   Secretary Phone:  +39 0461 314 870

More information about the Corpora mailing list