[Corpora-List] Summary: Corpus of translated material
nomi.guthmann at googlemail.com
Thu Mar 8 13:48:00 CET 2007
Dear corpora list members,
Here is the summary of the various responses on corpora of translated
material (the main requirement was to know the source language of the
The EUROPARL corpus
In its current form, it does not include information of the source
language of the various texts, but I was told that its next release
The English-Estonian and Estonian-English parallel corpus :
It includes Estonian laws and EU legislation, and their translation.
The INTERSECT corpus
It includes English-French, English-German translations in several domains.
The COMPARA corpus
It includes English and Portuguese bi-directional parallel texts.
The OPUS corpus
It is an open source parallel corpus in several languages.
Jörg Tiedemann also has a corpus of aligned movie subtitles, available
for research purposes only.
The TEC corpus
A large corpus of translated English.
The Bible corpus
Corina Forascu has a section of the TimeBank 1.2 (English) corpus
translated into Romanian.
JRC-Acquis multilingual parallel corpus
A parallel corpus in several languages. The source languages in this
corpus are unknown.
The CroCo project
Corpus of German and English translations. The corpus is not available
for copyright reasons.
Many thanks for responses:
Translation and Interpreting Studies Department
Bar Ilan University
More information about the Corpora-archive