[Corpora-List] Call for Task Proposals - SemEval 2016

Zesch, Torsten torsten.zesch at uni-due.de
Thu Dec 4 12:29:38 CET 2014

SemEval-2016: International Workshop on Semantic Evaluations

Call for Task Proposals

We invite proposals for tasks to be run as part of SemEval-2016: http://alt.qcri.org/semeval2016/

SemEval (Semantic Evaluation) is an ongoing series of evaluations of computational semantic analysis systems, organized under the umbrella of SIGLEX, the Special Interest Group on the Lexicon of the Association for Computational Linguistics. As of 2014, it runs yearly, but in overlapping two-year cycles (e.g., see the SemEval-2016 schedule below, which spans over 2015 and 2016).

The SemEval evaluations explore the nature of meaning in natural languages in practical terms, by providing an emergent mechanism to identify the problems (e.g., how to characterize meaning and what is necessary to compute it) and to explore the strengths of possible solutions by means of standardized evaluation on shared datasets. SemEval evaluations initially focused on identifying word senses computationally, but have later grown to investigate the interrelationships among the elements in a sentence (e.g., semantic relations, semantic parsing, semantic role labeling), relations between sentences (e.g., coreference), and author attitudes (e.g., sentiment analysis), among other research directions.

See the SemEval Wikipedia entry (http://en.wikipedia.org/wiki/SemEval) for a more detailed historial overview. SemEval-2016 will be the 10th workshop on semantic evaluation. You can also check the websites of previous editions of SemEval to get an idea about the range of tasks explored, e.g., for SemEval-2015: http://alt.qcri.org/semeval2015/

For SemEval-2016, we welcome any task that can test an automatic system for semantic analysis of text, be it application-dependent or application-independent. We especially welcome tasks for different languages, cross-lingual tasks, tasks requiring semantic interpretation (e.g., metaphor interpretation), and tasks with both intrinsic and application-based evaluation.

We encourage the following aspects in task design:

* BASELINE SYSTEMS, FORMAT CHECKERS, SCORERS * Task organizers should provide to task participants format checkers and standard scorers. Moreover, in order to lower the obstacles to participation, we encourage task organizers to provide baseline systems that participants can use as a starting point. A baseline system typically contains code that reads the data, creates a baseline response (e.g., random guessing, majority class prediction, etc.), and outputs the evaluation results. Whenever possible, baseline systems should be written in widely used programming languages and/or should be implemented as a component for standard NLP pipelines such as UIMA or GATE.

* SHARED TEXTS BETWEEN TASKS & COMMON ANNOTATION FORMATS * For many tasks, finding suitable texts for building training and testing datasets in itself can be a challenge or somewhat ad hoc. As a result, previous SemEval tasks have largely produced datasets independent of one another, despite the encouragement of organizers and the potential benefit of having richer semantic annotations of a corpus.

Beginning with SemEval-2015, tasks have been organized into tracks, reflecting related objectives between tasks. Starting with SemEval-2016, the program committee will strongly encourage two points of coordination between task organizers grouped into the same track. First, whenever possible, tasks in the same track should use a common dataset and annotation format to make it easy for teams to participate in multiple tasks in the same track. To make it easier for task organisers to find suitable texts, we encourage the re-annotation of datasets from previous years with new annotation types or texts from publicly available corpora such as OntoNotes, OpenANC, or Wikipedia. Second, task organizers in a track will be asked to try to coordinate, whenever possible, to produce or annotate training data that is common between the tasks. This track-shared corpus is not intended to be a taskıs entire dataset; our intent is to provide new research opportunities for building upon richer annotated data from related semantic phenomena.

* UMBRELLA TASKS * In order to reduce fragmentation of similar tasks and increase community effort towards solving the underlying research problems, we encourage task organisers to propose larger tasks that include several related subtasks. For example, a Word Sense Induction umbrella task might include subtasks for Japanese and English. Similarly, a Sentiment Analysis umbrella task might include subtasks for Twitter, Product Reviews, and Service Reviews. We also welcome task proposals for umbrella tasks focusing on different aspects of the same phenomena. For example, an Attitude Inference task might have subtasks for detecting an authorıs emotional state, the sentiment of their writing, and the writingıs objectivity. In addition, the program committee will actively encourage task organisers proposing similar tasks to combine their efforts into larger umbrella tasks.

* APPLICATION-ORIENTED TASKS * We welcome tasks that are devoted to developing novel applications of computational semantics. As an analogy, the TREC Question-Answering (QA) track was solely devoted to building QA systems to compete with current IR systems. Similarly, we will encourage tasks that have a clearly defined end-user application showcasing and are enhancing our understanding of computational semantics, as well as extending the current state-of-the-art.


We welcome both new tasks and task reruns. For a new task, a major concern to be addressed in the proposal is whether it would be able to attract participants. For task reruns, the organizers should in their proposal defend the need for another iteration of their task, e.g., because there is a need for a new form of evaluation (e.g., a new metric to test new phenomena, a new application-oriented scenario, etc.), or there is a need to test on new types of data (e.g., social media, domain-specific corpora), or there is significant expansion in scale over a previous trial run of the task, etc.

In the case of a rerun, we further discourage carrying over the same subtasks year after year and just adding new tasks as this can lead to the accumulation of too many subtasks. Evaluating on a different dataset with the same task formulation typically should not be considered a separate subtask.


Task proposals will be reviewed by experts, and the reviews will serve as the basis for acceptance decisions. In case of conflict, more innovative new tasks will be given preference over task re-runs. In case of very similar task proposals, the selection committee will propose task mergers. If no consensus can be reached, the task with the better reviews will be given preference. In case of task proposals leaving important questions open, a task might also be conditionally selected and might finally be dropped if sufficient answers are not provided in time.


SemEval-2016 Task proposals due January 31, 2015 Task selection notification March 5, 2015 Tasks merged March 31, 2015 Trial data ready May 31, 2015 Training data ready July 30, 2015 Test data ready December 15, 2015 Evaluation start January 10, 2015 Evaluation end January 31, 2015 Paper submission due February 28, 2016 [TBC] Paper reviews due March 31, 2016 [TBC] Camera ready due April 30, 2016 [TBC] SemEval workshop Summer 2016

The SemEval-2016 Workshop will be co-located with a major NLP conference in 2016.


The task proposals should be about 3-8 pages long, and should contain the following: Summary

- A short description of the task in general (including motivation)

- In case of a rerun, justification why this is needed using the criteria discussed above

- Estimated number of participating teams and plans on how to encourage participation; this is especially important for new tasks

Data & Resources

- How the training/testing data will be built and/or procured

- What source texts/corpora are going to be used? Please discuss whether existing corpora have been re-used or not.

- How much data is going to be produced

- How will quality of the data be ensured and evaluated

- An example how a data instance would look like

- The anticipated availability of the necessary resources to the participants (copyright, etc.)

- The resources required to prepare the task (computation and annotation time, costs of annotations, etc.) and their availability


- The evaluation methodology to be used, including clear evaluation criteria

Task organizers

- Names, affiliations, brief description of research interests and relevant experience, contact information (email).

Please submit proposals by mail in PDF format to the SemEval email address: semeval-organizers at googlegroups.com

In case you are not sure whether a task is suitable for SemEval, please feel free to get in touch to discuss your idea.

CHAIRS Daniel Cer, Google Inc. David Jurgens, McGill University, Montreal, Canada Preslav Nakov, Qatar Computing Research Institute Torsten Zesch, University of Duisburg-Essen, Germany

The SemEval DISCUSSION GROUP Please join our discussion group at semeval3 at googlegroups.com in order to receive announcements and participate in discussions.

The SemEval-2016 WEBSITE: http://alt.qcri.org/semeval2016/

More information about the Corpora mailing list