[Corpora-List] CfP: The Third Workshop on Technologies for MT of Low Resource Languages (LoResMT 2020) at AACL-IJCNLP

alinak alinak at coli.uni-saarland.de
Sun Aug 2 11:59:05 CEST 2020

The Third Workshop on Technologies for MT of Low Resource Languages AACL-IJCNLP Virtual Event (December 4-7, 2020) https://sites.google.com/view/loresmt/


In the past few years machine translation (MT) performance has been improved significantly. With the development of new techniques such as multilingual translation and transfer learning, the use of MT is no longer a privilege to users of popular languages. Consequently, there has been an increasing interest in the community to expand the coverage to more languages with different geographical presence, degree of diffusion and digitalization. However, the goal to increase MT coverage for more users speaking diverse languages, is limited by the fact the MT methods demand huge amounts of data to train quality systems, which has posed a major obstacle to develop MT systems for low resource languages. Therefore, developing comparable MT systems with relative small datasets is still highly desirable. In addition, despite the fast developments of MT technologies, MT systems still rely on several NLP tools to pre-process human-generated texts in the forms that are required as input for MT systems and post-process the MT output in proper textual forms in the target language. This is especially true when it comes to systems involving low resource languages. These NLP tools include, but are not limited to, several kinds of word tokenizers/de-tokenizers, word segmenters, morphology analysers, etc. The performance of these tools has a great impact on the quality of the resulting translation. There is only limited discussion on these NLP tools, their methods, their role in training different MT systems, and their coverage of support in the many languages of the world. The workshop provides a discussion panel for researchers working on MT systems/methods for low resource and under-represented languages in general. We would like to help review/overview the state of MT for low resource languages and define the most important directions. We also solicit papers dedicated to supplementary NLP tools that are used in any language and especially in low resource languages. Overview papers of these NLP tools are very welcome. It will be beneficial if the evaluations of these tools in research papers include their impact on the quality of MT output.

Topics of Interest

We solicit original research papers, review papers, and position papers on MT research for low resource languages in the workshop. Multilingual and/or cross-lingual NLP tools for low-resource languages are especially welcome.

- Research and review papers of pre-processing and/or post-processing NLP tools for MT - Position papers on the development of pre-processing and/or post-processing tools for MT - Word tokenizers/de-tokenizers for specific languages - Word/morpheme segmenters for specific languages - Alignment/Re-ordering tools for specific language pairs - Use of morphology analyzers and/or morpheme segmenters in MT - Multilingual/cross-lingual NLP tools for MT - Re-usability of existing NLP tools for low resource languages - Corpora creation and curation technologies for low resource languages - Review of available parallel corpora for low resource languages - Research and review papers of MT methods for low resource languages - MT systems/methods (e.g. rule-based, SMT, NMT) for low resource languages - Pivot MT for low resource languages - Zero-shot MT for low resource languages - Fast building of MT systems for low resource languages - Re-usability of existing MT systems for low resource languages - Machine translation for language preservation

Important Dates

September 11, 2020 - Paper submissions due September 21-October 9, 2020 - Review period October 23, 2020 - Notification November 6, 2020 - Camera-ready due December 4-5 - LoResMT workshop

Invited speakers Grace Tang, Alp Íktem - Translators Without Borders

Organizers (listed alphabetically)

Alina Karakanta (Fondazione Bruno Kessler) Atul Kr. Ojha (DSI, National University of Ireland Galway & Panlingua Language Processing LLP) Chao-Hong Liu (Iconic Translation Machines) Jade Abbott (Retro Rabbit) Jonathan Washington (Swarthmore College) Nathaniel Oco (Philippines) Surafel Melaku Lakew (Fondazione Bruno Kessler) Tommi A Pirinen (University of Hamburg) Valentin Malykh (Huawei Noah's Ark lab and Kazan Federal University) Varvara Logacheva Skolkovo (Institute of Science and Technology) Xiaobing Zhao (Minzu University of China)

Paper submission There are two types of submissions in the workshop. For research, review and position papers, the length of each paper should be at least four (4) and not exceed eight (8) pages, plus unlimited pages for references. For system demonstration papers, the limit is four (4) pages. Submissions should be formatted according to the official AACL-IJCNLP 2020 style templates (LaTeX, Microsoft Word, Overleaf). Accepted papers will be published on-line in the AACL-IJCNLP 2020 proceedings and will be presented at the conference either orally or as a poster. Submissions must be anonymised and should be done using the Softconf START conference management system at https://www.softconf.com/aacl-ijcnlp2020/LowResMT. Scientific papers already, or to be, submitted to other venues must be declared as such, and must be withdrawn from the other venues if accepted and published at LoResMT. The review will be double-blinded. We would like to encourage authors to cite papers written in ANY language that are related to the topics, as long as both original bibliographic items and their corresponding English translations are provided.

Previous editions LoResMT @ MT Summit 2019 https://sites.google.com/view/loresmt/loresmt-2019

LoResMT @ AMTA 2018 https://sites.google.com/view/loresmt-2018/

More information about the Corpora mailing list