[Corpora-List] Final CFP: ACL 2007 Statistical MT Workshop

Christof Monz christof at dcs.qmul.ac.uk
Tue Mar 20 00:46:01 CET 2007

Final Call for Papers

ACL 2007


regular paper deadline: April 2, 2007
shared task results deadline: April 6, 2007

Translating documents between two different languages by computer has
been one of the oldest goals in computational linguistics. Now, armed
with vast amounts of translated text and powerful computers, we are
witnessing significant progress toward achieving that goal.

Statistical methods allow the analysis of parallel corpora and the
automatic construction of machine translation systems. For some
language pairs such as Chinese-English or Arabic-English, statistical
machine translation (SMT) systems built at research labs currently
outperform commercial systems.

This workshop focuses on statistical and hybrid methods for machine
translation and features a shared translation task. The evaluation of
machine translation systems is a growing field and this workshop will
also focus on determining the best methodology for evaluating
translation quality both with automatic metrics and through subjective
human evaluation.

This workshop builds on the success of the 2005 ACL Workshop on
Parallel Text and the 2006 NAACL Workshop on Statistical Machine

Topics of interest include, but are not limited to:

* word-based, phrase-based, syntax-based SMT
* using comparable corpora for SMT
* using morphological and POS information for SMT
* integration of rule-based MT and statistical MT
* decoding
* error analysis
* evaluation techniques for MT


In addition to soliciting research papers on the topics listed above,
the workshop will also feature a shared translation task. The workshop
organizers will provide common test sets for translation between four
language pairs in both directions:

* English-German and German-English
* English-French and French-English
* English-Spanish and Spanish-English
* English-Czech and Czech-English

Participants may submit translations for any or all of the language
directions. In addition to the common test sets the workshop
organizers will provide optional training resources, including a newly
expanded release of the Europarl corpora, and additional out-of-domain

All participants who submit entries will have their translations
evaluated. In addition to automatic scoring, we will also evaluate
translation performance by human judgment. To facilitate the human
evaluation we will require participants in the shared task to manually
judge some of the submitted translations.

A more detailed description of the shared task (including information
about the test and training corpora, a freely available MT system, and
a number of other resources) is available from


We also provide a baseline machine translation system, whose
performance matches the best systems from last year's shared task.


Submissions will consist of regular full papers of max. 8 pages,
formatted following the ACL 2007 guidelines. Authors of regular full
papers will be required to indicate a track for their submission. In
addition, teams participating in the shared tasks will be invited to
submit short papers (max. 4 pages) describing their systems. Both
submission and review processes will be handled electronically.

We encourage individuals who are submitting research papers to
evaluate their approaches using the training resources provided by
this workshop, so that their experiments can be repeated by others
using these publicly available corpora.

Given the overlap of the paper submission time frame with that of EMNLP
2007, we accept papers that are also submitted to the EMNLP
conference, but would like to know as soon as possible after the
notification if an accepted paper will be withdrawn.


Regular paper submissions: April 2
(shared task) Results submissions: April 6
(shared task) Short paper submissions: April 13
Notification: April 23
Camera-ready papers: May 9


Philipp Koehn (University of Edinburgh)
Christof Monz (Queen Mary, University of London)
Cameron Shaw Fordyce (Center for the Evaluation of Language
and Communication Technologies)
Chris Callison-Burch (University of Edinburgh)



Lars Ahrenberg (Linkoping University)
Francisco Casacuberta (University of Valencia)
Colin Cherry (University of Alberta)
Stephen Clark (Oxford University)
Brooke Cowan (Massachusetts Institute of Technology)
Mona Diab (Columbia University)
Chris Dyer (University of Maryland)
Andreas Eisele (University Saarbruecken)
Marcello Federico (ITC-IRST)
George Foster (Canada National Research Council)
Alex Fraser (ISI/University of Southern California)
Ulrich Germann (University of Toronto)
Rebecca Hwa (University of Pittsburgh)
Kevin Knight (ISI/University of Southern California)
Philippe Langlais (University of Montreal)
Alon Lavie (Carnegie Mellon University)
Lori Levin (Carnegie Mellon University)
Daniel Marcu (ISI/University of Southern California)
Bob Moore (Microsoft Research)
Miles Osborne (University of Edinburgh)
Michel Simard (Canada National Research Council)
Eiichiro Sumita (ATR Spoken Language Translation Research Laboratories)
Joerg Tiedemann (University of Groningen)
Christoph Tillmann (IBM Research)
Dan Tufis (Romanian Academy)
Taro Watanabe (NTT)
Dekai Wu (HKUST)
Richard Zens (RWTH Aachen)

For questions, comments, etc. please send email to pkoehn at inf.ed.ac.uk

More information about the Corpora-archive mailing list