[Corpora-List] Public Release of the Arab-Acquis Corpus

Nizar Habash nizar.habash at nyu.edu
Sun Dec 9 07:02:44 CET 2018


Dear All - I am happy to share that the Arab Acquis Corpus is now publicly available here: https://camel.abudhabi.nyu.edu/arabacquis/.

Arab-Acquis is a large dataset for evaluating machine translation between 22 European languages and Arabic. Arab-Acquis consists of over 12,000 sentences from the JRCAcquis (Acquis Communautaire) corpus translated twice by professional translators, once from English and once from French, and totaling over 600,000 words.

This resource was developed at the Computational Approaches to Modeling Language (CAMeL <http://www.camel-lab.com/>) Lab in New York University Abu Dhabi <http://nyuad.nyu.edu/>.

The paper describing the effort is published here:

- Nizar Habash, Nasser Zalmout, Dima Taji, Hieu Hoang and Maverick

Alzate. 2017. A Parallel Corpus for Evaluating Machine Translation between

Arabic and European Languages. In Proceedings of the Conference of the

European Chapter of the Association for Computational Linguistics (EACL),

Valencia, Spain. [PDF <http://aclweb.org/anthology/E17-2038>] [BIB

<http://aclweb.org/anthology/E17-2038.bib>]

Regards

--

Nizar Habash Associate Professor of Computer Science New York University Abu Dhabi -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 2760 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20181209/9f377e33/attachment.txt>



More information about the Corpora mailing list