THIRD WORKSHOP ON
STATISTICAL MACHINE TRANSLATION
June 19 or 20, 2008
This workshop on statistical and hybrid methods for machine translation, and builds on the 2005 ACL Workshop on Parallel Text, the 2006 NAACL Workshop on Statistical Machine Translation, and the 2007 ACL Second Workshop on Statistical Machine Translation. The workshop will feature papers on topics related to MT, and will feature two shared tasks: a shared translation task for 12 pairs of European languages, and a shared evaluation task to test automatic evaluation metrics.
Topics of interest include, but are not limited to:
* word-based, phrase-based, syntax-based SMT
* using comparable corpora for SMT
* incorporating linguistic information into SMT
* system combination
* error analysis
* manual and automatic method for evaluating MT
* scaling MT to very large data sets
We encourage authors to evaluate their approaches to the above topics using the common data sets created for the shared translation task. In addition to scientific papers, we will also feature two shared tasks.
SHARED TRANSLATION TASK
The first is a shared translation task which will examine translation between the following language pairs:
* English-German and German-English
* English-French and French-English
* English-Spanish and Spanish-English
* German-Spanish and Spanish-German
* English-Czech and Czech-English
* English-Hungarian and Hungarian-English
Participants may submit translations for any or all of the language directions. In addition to the common test sets the workshop organizers will provide optional training resources, including a newly expanded release of the Europarl corpora and out-of-domain corpora.
All participants who submit entries will have their translations evaluated. We will evaluate translation performance by human judgment. To facilitate the human evaluation we will require participants in the shared task to manually judge some of the submitted translations.
A more detailed description of the shared translation task (including information about the test and training corpora, a freely available MT system, and a number of other resources) is available from:
We also provide a baseline machine translation system, whose performance matches the best systems from last year's shared task. SHARED EVALUATION TASK
The second task is a shared evaluation task. Participants in this task will submit automatic evaluation metrics for machine translation, which will be assessed on their ability to:
* Rank systems on their overall performance on the test set
* Rank systems on a sentence by sentence level
Participants in the shared translation task will submit translation results for a set of a few thousand sentences. Their system outputs will be distributed to participants in the shared evaluation task along with the reference translations. The translations will be ranked with automatic evaluation metrics. We will measure the correlation of automatic evaluation metrics with the human judgments.
More details of the shared evaluation task (including submission formats and the collected manual evaluations from last year's workshop) is available from:
PAPER SUBMISSION INFORMATION
Submissions will consist of regular full papers of max. 8 pages, formatted following the ACL 2008 guidelines. In addition, shared task participants will be invited to submit short papers (max. 4 pages) describing their systems or their evaluation metrics. Both submission and review processes will be handled electronically.
We encourage individuals who are submitting research papers to evaluate their approaches using the training resources provided by this workshop and past workshops, so that their experiments can be repeated by others using these publicly available corpora.
March 14, 2008 Regular paper submissions March 21, 2008 (shared translation task) Results submissions April 4, 2008 (shared evaluation task) Results submissions April 4, 2008 (both shared tasks) Short paper submissions April 12, 2008 Notification (both regular and short papers) April 21, 2008 Camera-ready papers
Chris Callison-Burch (Johns Hopkins University) Philipp Koehn (University of Edinburgh) Christof Monz (University of London) Josh Schroeder (University of Edinburgh) Cameron Shaw Fordyce