[Corpora-List] ECML/PKDD 2016 3rd Discovery Challenge: Learning to Re-Rank Questions for Community Question Answering

Alessandro Moschitti amoschitti at gmail.com
Thu Jun 23 12:53:36 CEST 2016

Dear all, let me suggest you an interesting challenge on a novel IR/NLP application:

ECML/PKDD 2016 3rd Discovery Challenge: http://alt.qcri.org/ecml2016/

**cQA Challenge: Learning to Re-Rank Questions for Community Question Answering**

The challenge consists in designing innovative and powerful machine learning algorithms for building automatic question rerankers.

The participants are provided with data representing original questions from forum users associated with other forum questions retrieved by Google (queried with the original questions).

The relevancy of the retrieved question is manually annotated.

The available representations consist in (i) feature vectors, e.g., based on several text similarity measures and (ii) Gram matrices built with tree kernels applied to pairs of syntactic/semantic trees of natural language questions, see:


Participants are asked to use the advanced semantic representations of questions made available by the organizers for designing supervised approaches to question reranking.

How to best exploit such rich semantic information in learning to rank algorithms is a very exciting research, which also studies real-world applications, e.g., how to improve search engines for high-level semantic tasks.

For more information please see below.



Abstract =======

Due to the extended use of Web forums, such as Yahoo! Answers or Stackoverflow, there has been a renewed interest in Community Question Answering (cQA). cQA combines traditional question answering with a modern Web scenario, where users pose questions hoping to get the right answers from other users. The most critical problem arises when a new question is asked in the forum. If the user's question is similar (even semantically equivalent) to a previously posted question, she/he should not wait for answers or for another user to address her/him to the relevant thread already archived in the forum. An automatic system can search for previously-posted relevant questions and instantaneously provide the found information.

In this challenge, given a new question and a set of questions previously posted to a forum, together with their corresponding answer threads, a machine learning model must rank the forum questions according to their relevance against the new user question. Even if this task involves both Natural Language Processing (NLP) and Information Retrieval, the challenge focuses on the machine learning aspects of reranking the relevant questions. Therefore, we provide both the initial rank and the feature representation of training and test examples to the participants. We extract features from the text of the user and forum questions using advanced NLP techniques, e.g., syntactic parsing. Most interestingly, we also provide the Gram matrices of tree kernels applied to advanced structural tree representation. A few other features express the relevance of the thread comments, associated with the forum questions, against the user question. Participants are expected to exploit these data for building novel and effective machine learning models for reranking the initial question list in a better rank according to Mean Average Precision (MAP).


Discovery Challenge Chairs Elio Masciari, ICAR CNR, Italy Alessandro Moschitti, Qatar Computing Research Institute, HKBU

cQA Challenge Chairs Alberto Barrón-Cedeño, Qatar Computing Research Institute Giovanni Da San Martino, Qatar Computing Research Institute Simone Filice, Università degli Studi di Roma "Tor Vergata" Preslav Nakov, Qatar Computing Research Institute


Prizes will be awarded to the two best performing teams: € 1,000 to the winner on the test set; € 500 to the winner on the development set. If the same team wins on both sets, the € 500 go to the first runner up on the test set.

Important dates

Registration deadline: Friday, July 22, 2016 End of submission period on the development set: Friday, July 22, 2016 Release of the test set: Saturday, July 23, 2016 End of submission period on the test set: Saturday, July 30, 2016 Winner announcement: Monday, August 1, 2016 Deadline for system description report submission (selected only): Sunday, August 7, 2016

More information about the Corpora mailing list