EMNLP 2015 Workshop on Discourse in Machine Translation (DiscoMT'15)


17 September 2015 -- Lisbon, Portugal

Final call for papers - Submission deadline: 28 June 2015

It is well-known that texts have properties that go beyond those of their individual sentences and that reveal themselves in the frequency and distribution of words, word senses, referential forms and syntactic structures, including: - document-wide properties, such as style, register, reading level and genre; - patterns of topical or functional sub-structure; - patterns of discourse coherence, as realized through explicit and/or implicit relations between sentences, clauses or referring forms; - anaphoric and elliptic expressions, in which speakers exploit the previous discourse context to convey subsequent information very succinctly.

By the end of the 1990s, these properties had stimulated considerable research in Machine Translation, aimed at endowing machine--translated texts with similar document and discourse properties as their source texts. A period of ten years then elapsed before interest resumed in these topics, now from the perspectives of Statistical and/or Hybrid Machine Translation. This led to the first ACL Workshop on Discourse in Machine Translation (DiscoMT) in 2013, held in Sofia, Bulgaria.

Since then, SMT has itself evolved in ways that allow more access to needed linguistic knowledge, through the availability of feature-rich statistical models. As such, we are now holding a second DiscoMT workshop (DiscoMT'15), this time with a complementary Shared Task (see below).

DiscoMT'15 solicits submissions on any the following topics and any language pairs, but also welcomes submissions that link discourse studies with machine translation in some other way.

- discourse processing in support of MT, including:

. textual coherence, including anaphora, coreference, tense, aspect and modality

. textual cohesion, including lexical consistency

. discourse structure, including use of connectives and information structuring devices

. topic structure

. consistency in style and register; - MT techniques for obtaining document-level consistency and domain adaptability; - MT techniques for structured documents; - methods and algorithms to handle discourse-level phenomena in MT training and decoding; - uses of MT in processing discourse-level phenomena; - techniques for evaluating the effect of efforts targetting discourse-level phenomena in SMT - techniques for assessing the impact of discourse-level processing on MT quality; - quantitative studies on the impact of discourse-level phenomena on current MT systems vs. discourse-aware ones.


We solicit previously unpublished work, presented either as long or short papers, following the ACL 2015 formatting guidelines at


Long papers should have at most 8 pages of content, not including references. Short papers are limited to 4 pages of content, not including references. There is no constraint on the size of the reference list. Submissions should be anonymous and not disclose in any way the identity of the author(s). Submissions should be made using the START system at



Submission deadline: 28 June 2015 Notification of acceptance: 21 July 2015 Final versions due: 11 August 2015 Workshop: 17 September 2015


Bonnie Webber, University of Edinburgh Andrei Popescu-Belis, Idiap Research Institute Marine Carpuat, University of Maryland


Ani Nenkova, University of Pennsylvania Christian Hardmeier, Uppsala University Jorg Tiedemann, Uppsala University Lori Levin, Carnegie Mellon University Lucia Specia, University of Sheffield Mark Fishel, University of Zurich Min Zhang, Soochow University Preslav Nakov, Qatar Computing Research Institute


Liane Guillou, University of Edinburgh Beata Beigman Klebanov, Educational Testing Service, New Jersey Francisco Guzmán, Qatar Computing Research Institute, Doha, Qatar Shafiq Joty, Qatar Computing Research Institute, Doha, Qatar Thomas Meyer, Google, Zurich Michal Novak, Charles University, Prague Lucie Poláková, Charles University, Prague Maja Popovic, DFKI, Berlin Sara Stymne, University of Uppsala Yannick Versley, University of Heidelberg Marion Weller, University of Stuttgart


The DiscoMT shared task will consist of two sub-tasks, designed to make it interesting to both the MT and discourse communities. For the MT community, there is a practical MT task, for the discourse community, a classification task that requires no specific MT expertise. Both subtasks will be run on transcripts from the TED conference series. Both subtasks use the language pair English-French, which has a sufficiently high baseline performance to produce basically intelligible output, as well as interesting differences in their pronoun systems.

Subtask A: Pronoun-focused Translation

The first subtask is a regular end-to-end statistical machine translation (SMT) task, where participants are provided training data for an SMT system and are asked to generate a translation of a unseen test set for the evaluation. Unlike other MT shared tasks, our primary evaluation will focus not on general MT quality, but specifically on the correctness of pronoun translation. Thanks to a grant from the European Association for Machine Translation, the evaluation of pronoun correctness will be carried out manually and is complimentary for the participants.

Task B: Cross-Lingual Pronoun Prediction

The second task requires participating systems to predict the correct translation of a source language pronoun from a small set of classes. The input data will consist of the source language text and a complete manual reference translation from which the target pronouns have been removed. The evaluation of this task will be fully automatic by matching against the pronouns found in the reference translation.

Further details on the shared task can be found at


Shared Task Coordinators

Christian Hardmeier, Uppsala University Preslav Nakov, Qatar Computing Research Institute Sara Stymne, Uppsala University Yannick Versley, University of Heidelberg Jörg Tiedemann, Uppsala University

