[Corpora-List] Community Question Answering Challenge at SemEval-2017

Alessandro Moschitti amoschitti at gmail.com
Thu Jan 5 12:54:33 CET 2017

Call for Participation to the SemEval 2017 Task 3 challenge: Community Question Answering (cQA)

Website: http://alt.qcri.org/semeval2017/task3 <http://alt.qcri.org/semeval2017/task3>

Essential dates: - Evaluation period: January 09-30, 2017 (small change) - Paper submission: February 27, 2017 - SemEval workshop: August 03, 2017

(2-day workshop at ACL 2017, Vancouver, Canada).

Contacts: Email: semeval-cqa at googlegroups.com <mailto:semeval-cqa at googlegroups.com> Google Group: semeval-cqagooglegroups.com <http://semeval-cqagooglegroups.com/>

Organizers: Preslav Nakov, Lluís Màrquez, Alessandro Moschitti (Qatar Computing Research Institute, HBKU) Timothy Baldwin, Doris Hoogeveen, Karin Verspoor (University of Melbourne)

Summary: The challenge concerns the development of systems for the automatic selection of questions and answers in cQA ecosystems. It involves the development of ranking models: given (i) a new question and (ii) a large collection of previously-asked questions and comment threads, created by a user community, the goal is to rank questions and/or comments from these threads in order of usefulness for answering the new question.

Main features:

1. Real-world application scenarios:

- Qatar Living forum, Medical cQA, StackExchange

- Technology immediately usable in various cQA applications

2. Large annotated corpus for question-question similarity defined by a cQA application

3. Opportunity to model and explore a variety of research directions:

- building systems for the individual components of cQA, which can then be merged into a more complex model to solve the overall task;

- designing question similarity components (typically needed in cQA) for improving answer selection (required by traditional QA);

- modeling the interaction among the answers in threads for improving answer selection; and

- modeling the interaction among threads using question-question similarity;

(textual inference as in traditional QA, textual entailment, semantic similarity, using challenging social media text)

4. Multilinguality: offered in Arabic and English

5. Multiple domains with real-life questions.


Subtask A (English): Question-Comment Similarity Given a question and the first 10 comments in its question thread, rerank these 10 comments according to their relevance with respect to the question.

Subtask B (English): Question-Question Similarity Given a new question (aka original question) and the set of the first 10 related questions (retrieved by Google), rerank the related questions according to their similarity with the original question.

Subtask C (English): Question-External Comment Similarity --this is the main English subtask. Given a new question (aka the original question) and the set of the first 10 related questions (retrieved by a Google), each associated with its first 10 comments appearing in its thread, rerank the 100 comments (10 questions x 10 comments) according to their relevance with respect to the original question.

Subtask D (Arabic): Rerank the correct answers for a new question. Given a new question (aka the original question) and the set of related questions (retrieved by Google), each associated with one correct answer, rerank the question-answer pairs according to their relevance with respect to the original question.

Subtask E (English): This subtask is similar to subtask B, Question-Question similarity, but on a larger scale and incorporating multiple domains simultaneously. The English CQADupStack corpus is used, which is made of 7,214,697 threads across 12 subforums extracted from StackExchange. Users of StackExchange mark questions as duplicates when they are noticed to have been previously answered. Such data provides a large resource for exploring the semantic similarity or entailment among questions and question-answer threads. More in detail, given, a new question (aka the original question) and a set of 50 candidate questions, the task is to rerank them according to their relevance with respect to the original question, and truncate the result list in such a way that only "PerfectMatch" questions appear in it.

Note: all subtasks are optional; participant teams can decide taking part in any subset of the above described subtasks.

More Information: - For a longer introduction, please refer to: http://alt.qcri.org/semeval2017/task3 <http://alt.qcri.org/semeval2017/task3>

- For a precise definition of all subtasks and evaluation see the Task Description page: http://alt.qcri.org/semeval2017/task3/index.php?id=description-of-tasks <http://alt.qcri.org/semeval2017/task3/index.php?id=description-of-tasks>

- The corpora and the tools can be downloaded from the Data and Tools page: http://alt.qcri.org/semeval2017/task3/index.php?id=data-and-tools <http://alt.qcri.org/semeval2017/task3/index.php?id=data-and-tools>

- Registration to SemEval 2017: https://goo.gl/jGS9cr <https://goo.gl/jGS9cr> (please note that task registration is also required for submitting system results)

Finally, do not miss the important dates (the evaluation period is January 09-30, 2017):

- Mon 05 Sep 2016: Training data ready for all tasks - Mon 09 Jan 2017: Release of the test data for subtasks A-D - Mon 21 Jan 2017: Release of the test data for subtask E - Mon 30 Jan 2017: Deadline for the final submission on CodaLab - Mon 06 Feb 2017: Results posted - Mon 27 Feb 2017: Paper submissions due - Mon 03 Apr 2017: Author notifications - Mon 17 Apr 2017: Camera-ready submissions due - Thu 03 Aug 2017: SemEval workshop at ACL 2017, Vancouver, Canada (2-day workshop).

-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 7686 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20170105/466f346a/attachment.txt>

More information about the Corpora mailing list