CORBON 2017: 2nd Workshop on Coreference Resolution Beyond OntoNotes to be held at EACL 2017 (Valencia, Spain), April 4, 2017

More information: <http://corbon.nlp.ipipan.waw.pl/> http://corbon.nlp.ipipan.waw.pl/

New submission deadline: January 23, 2017

Please also note that:

* A small number of travel awards will be available to students who have papers accepted to CORBON 2017.

* You can still take part in the <http://corbon.nlp.ipipan.waw.pl/index.php/shared-task/> shared task on coreference resolution for German and Russian.

* The submission link is available: <https://www.softconf.com/eacl2017/corbon> https://www.softconf.com/eacl2017/corbon.

Call for Papers

Many NLP researchers, especially those not working in the area of discourse processing, tend to equate coreference resolution with the sort of coreference that people did in MUC, ACE, and OntoNotes, having the impression that coreference is a well-worn task owing in part to the large number of coreference papers reporting results on the MUC/ACE/OntoNotes coreference corpora. This is an unfortunate misconception: the previous shared tasks on coreference resolution have largely focused on entity coreference, which constitutes only one of the many kinds of coreference relations that were discussed in theoretical and computational linguistics in the past few decades. In fact, by focusing on entity coreference resolution, NLP researchers have only scratched the surface of the wealth of interesting problems in coreference resolution.

The first workshop on Coreference Resolution Beyond OntoNotes ( <http://corbon.nlp.ipipan.waw.pl/2016/> CORBON 2016), which was held in conjunction with NAACL HLT 2016, sought to:

* encourage work on under-investigated coreference resolution tasks as well as coreference resolution in under-investigated languages and

* provide a forum for coreference researchers to discuss and present such work. The workshop was quite successful in achieving its goals: the majority of the submissions focused on coreference resolution in less-investigated languages, and more than half of the submissions focused on under-investigated coreference tasks.

Building on the success of its previous edition, CORBON 2017 will include:

* a special theme on knowledge-rich coreference resolution;

* a shared task on coreference resolution in languages without coreference-annotated data; and (3) a panel discussing the future research directions for coreference resolution.


The workshop welcomes submissions describing both theoretical and applied computational work on coreference resolution, especially for languages other than English, less-researched forms of coreference and new applications of coreference resolution. The submissions are expected to discuss theories, evaluation, limitations, system development and techniques relevant to the workshop topics. Topics of interest include but are not limited to the following:

* Coreference resolution for less-researched languages (e.g., annotation strategies, resolution modules and formal evaluation)

* Evaluation of influence of language-specific properties such as lack of articles, quasi-anaphora, ellipsis or complexity of reflexive pronouns to coreference resolution

* Representation of coreferential relations other than identity coreference (e.g., bridging references, reference to abstract entities, etc.)

* Investigation of difficult cases of anaphora and coreference and their resolution by resorting to e.g. discourse-based and pragmatic levels

* Coreference resolution in noisy data (e.g. in speech and social networks)

* New applications of coreference resolution

Since progress in these under-explored coreference tasks is currently limited in part by the scarcity of annotated corpora, papers that describe the creation and annotation of corpora, especially those with less-investigated coreference phenomena and those involving less-researched languages, are particularly welcome. In addition, the program committee members will be asked to give special attention to submissions that echo our special theme on knowledge-rich coreference resolution, which, as mentioned above, involves the use of sophisticated knowledge sources for coreference resolution.

Shared Task

Previous shared tasks on coreference resolution (e.g., the <http://stel.ub.edu/semeval2010-coref/> SemEval 2010 shared task Coreference Resolution in Multiple Languages, the <http://conll.cemantix.org/2011/introduction.html> CoNLL 2011 and <http://conll.cemantix.org/2012/introduction.html> 2012 shared tasks) operated in a setting where a large amount of training data was provided to train coreference resolvers in a fully supervised manner. Our shared task has a different goal: we are primarily interested in a low-resource setting. In particular, we seek to investigate how well one can build a coreference resolver for a language for which there is no coreference-annotated data available for training.

With a rising interest in annotation projection, we hereby offer a projection-based task which will facilitate the application of existing coreference resolution algorithms to new languages. We believe that with this exciting setting, the shared task can help promote the development of coreference technologies that are applicable to a larger number of natural languages than is currently possible.

This year we will focus on two languages: German and Russian. To mimic a low-resource setting, no German or Russian coreference-annotated data will be provided. Rather, to facilitate system development, the shared task participants will be provided two versions of an English-German-Russian parallel corpus: an unlabelled version and a labelled version. The labelled version has the English side of the parallel corpus automatically coreference-labelled using the Berkeley coreference resolver, which was trained on the English OntoNotes corpus.

Participants will compete in two tracks:

1. closed track: projection-based coreference resolution on German and/or Russian. The only coreference-annotated training data that the participants can use is the English OntoNotes corpus. Alternatively, they can use any of the publicly-available coreference resolvers trained on English OntoNotes. They can then use whatever parallel corpus and method they prefer to project the English annotations into German/Russian and subsequently train a new coreference resolver on the projected annotations. Note that they do not have to use the provided English-German-Russian parallel corpus.

2. open track: coreference resolution on German and Russian with no restriction on the kind of coreference-annotated data the participants can use for training. For instance, they can label their own German/Russian coreference data and use it to train a German/Russian coreference resolver, or they can adopt a heuristic-based approach where they employ knowledge of German/Russian to write coreference rules for these languages.

The participants can choose to take part in one or both tracks for one or both languages. The systems will be run on the test data by the participants who are required to send their outputs to the Shared Task Coordinator by December 27th (CET). <https://github.com/yuliagrishina/CORBON-2017-Shared-Task> Training data as well as several additional resources are already available on <http://corbon.nlp.ipipan.waw.pl/index.php/shared-task/> the shared task page.

The evaluation will be done on a manually annotated German-Russian parallel corpus. The guidelines used for the annotation of the corpus are quite compatible with the OntoNotes guidelines for English (Version 6.0) in terms of types of referring expressions that are annotated.

The exceptions are that we:

a) handle only NPs and do not annotate verbs that are coreferent with NPs,

b) include appositions into the markable span and do not mark them as a separate relation,

c) annotate pronominal adverbs in German if they co-refer with an NP.

Please check our github repository <https://github.com/yuliagrishina/CORBON-2017-Shared-Task> for the complete guidelines and sample annotations. Similar to CoNLL 2012, we will compute a number of existing scoring metrics - MUC, B-CUBED, CEAF and BLANC - and use the unweighted average of MUC, B-CUBED and CEAF scores (computed by <http://conll.github.io/reference-coreference-scorers/> the official CoNLL 2012 scorer) to determine the winning system. We will not evaluate singletons and we kindly ask the participants to exclude them from the submitted data.

Submission instructions

We solicit previously unpublished work, presented either as long or short papers, following the style guidelines for EACL 2017, produced with the official LaTeX template ( <http://eacl2017.org/images/site/eacl-2017-template.zip> http://eacl2017.org/images/site/eacl-2017-template.zip). To be included in the final proceedings, accepted papers have to be made available both as LaTeX sources and PDF.

Long papers should have at most 8 pages of content, not including references. Short papers are limited to 4 pages of content, not including references. There is no constraint on the size of the reference list. Submissions should be anonymous and not disclose in any way the identity of the author(s). Submissions should be made using the START system ( <https://www.softconf.com/eacl2017/corbon/> https://www.softconf.com/eacl2017/corbon/).

Important dates

December 19, 2016: Evaluation data released

December 27, 2016: System outputs collected

January 6, 2017: Shared task results announced

January 16, 2017: Workshop paper / System description paper due date

February 11, 2017: Notification of acceptance

February 21, 2017: Camera-ready papers due date

April 4, 2017: Workshop date

Program Committee

Anders Björkelund, University of Stuttgart

Antonio Branco, University of Lisbon

Chen Chen, Apple

Dan Cristea, A. I. Cuza University of Iasi

Pascal Denis, MAGNET, INRIA Lille Nord-Europe

Sobha Lalitha Devi, AU-KBC Research Center, Anna University of Chennai

Yulia Grishina, University of Potsdam

Lars Hellan, Norwegian University of Science and Technology

Veronique Hoste, Ghent University

Yufang Hou, IBM

Ryu Iida, National Institute of Information and Communications Technology (NICT), Kyoto

Ekaterina Lapshinova-Koltunski, Saarland University

Emmanuel Lassalle, Global Systematic Investors LLP, UK

Chen Li, Microsoft

Sebastian Martschat, Heidelberg University

Ruslan Mitkov, University of Wolverhampton

Costanza Navaretta, University of Copenhagen

Anna Nedoluzhko, Charles University in Prague

Michal Novak, Charles University in Prague

Maciej Ogrodniczuk, Institute of Computer Science, Polish Academy of Sciences

Constantin Orasan, University of Wolverhampton

Massimo Poesio, University of Essex

Sameer Pradhan, cemantix.org and Boulder Learning Inc.

Sam Wiseman, Harvard University

Manfred Stede, University of Potsdam

Veselin Stoyanov, Facebook

Yannick Versley, Heidelberg University

Amir Zeldes, Georgetown University

Rob Voigt, Stanford University

Desislava Zhekova, Ludwig Maximilian University of Munich

Heike Zinsmeister, University of Hamburg

Workshop Organizers

Maciej Ogrodniczuk, Linguistic Engineering Group, Institute of Computer Science, Polish Academy of Sciences

Vincent Ng, Computer Science Department, The University of Texas at Dallas

Shared Task Coordinator

Yulia Grishina, University of Potsdam

