[Corpora-List] TICO-19 dataset and call for community contributions

Antonis Anastasopoulos anastasopoulos.ant at gmail.com
Tue Aug 4 04:59:12 CEST 2020

The Translation Initiative for COVID-19 <https://tico-19.github.io/> (TICO-19), is a consortium that aims to help enable the translation of COVID-19 related content in a wide range of languages, including very low-resource languages. We recently released a set of corpora including translations into 37 different languages, https://tico-19.github.io/testset.html, described in this paper: https://arxiv.org/abs/2007.01788.

We make a public call for community contributions to the TICO-19 project.


You can contribute by translating the TICO-19 benchmark in more

languages. Ideally, the benchmark will grow to cover as many of the world's

languages as possible! However, we note that a large portion of the

benchmark includes medical terminology which is important to accurately

translate. Thus, we strongly encourage that professional translators

handle the technical content, and that you follow a rigorous process of

Quality Assurance (e.g. a process similar to the one described in our

paper) over the produced translations. Reach out to us and we will be glad

to provide details on our QA process.


If you are a professional translator and have already produced COVID-19

related content, you can share your translation memory and we will combine

and release it with ours. Similarly, if you have compiled terminologies

with COVID-19 terms, or if you find errors in our published terminologies,

reach out and we will update them accordingly.


You can volunteer with Translators without Borders (TWB) if you are

fluent in at least one language other than your native language. Whether

you are interested in translating medical texts or translating for crisis

response, there are engaging projects available to suit all preferences.

Professional translators are especially encouraged to apply. Click here to

complete the Translator Application Form <https://trommons.org/register>.

All community contributions will be properly acknowledged and labeled as such.

On behalf of the TICO-19 partners, Antonis Anastasopoulos -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 6423 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20200803/0aaad490/attachment.txt>

More information about the Corpora mailing list