Similar Language Translation Task at WMT 2021 (co-located with EMNLP 2021) URL: http://www.statmt.org/wmt21/similar.html
The training/dev sets are available. The test data will be released on July 12, 2021. Please visit the website for more information.
With the widespread use of MT technology, there is more and more interest in training systems to translate between languages other than English (e.g. pairs of similar languages or dialects). The main challenge here is how to take advantage of the similarity between languages to overcome the limitation given the low amount of available parallel data to produce an accurate output.
Given the interest of the community in this topic we organize, for the third time at WMT, the shared task on "Similar Language Translation" to evaluate the performance of state-of-the-art translation systems on translating between pairs of languages from the same language family. This year we provide participants with training and testing data in multiple language pairs from three language families listed below. Evaluation will be carried out using automatic evaluation metrics and human evaluation.
This year we have multiple pairs of similar languages from three language families.
- Dravidian languages: Tamil - Telugu - Romance languages: Catalan, Spanish, Portuguese, and Romanian. - French to two similar low-resource Manding languages: Bambara and Maninka.
Test data release - July 12, 2021 Submission deadline - July 19, 2021 System description paper deadline - August 5, 2021 Camera-ready - September 15, 2021 WMT and EMNLP 2021 - November 10-11, 2021
Farhad Akhbardeh, Rochester Institute of Technology Marta Costa-jussà, Universitat Politècnica de Catalunya Magdalena Biesialska, Universitat Politècnica de Catalunya Christopher Homan, Rochester Institute of Technology Santanu Pal, Wipro AI Lab Allahsera Tapo, Rochester Institute of Technology Valentin Vydrin, Institut National des Langues et Civilisations Orientales (INALCO) Marcos Zampieri, Rochester Institute of Technology