[Corpora-List] corpus of student translations - looking for references

Andrei Popescu-Belis andrei.popescu-belis at issco.unige.ch
Wed Dec 8 14:08:00 CET 2004

Spela Vintar wrote:

> Dear Corpora Members,


> We have started with a group of students to compile a corpus of student

> translations, planned as a parallel corpus of original texts and several

> student translations, marked by their teachers.

> Has anyone been involved in a similar project? Any advice concerning the

> annotation scheme and corpus encoding, as well as the overall

> methodology, would be greatly appreciated.



We developed a small pilot corpus (EN->FR and FR->EN) at the School of
Translation of the University of Geneva, which is briefly described in:

Popescu-Belis A., King M. & Benantar H. (2002) - Towards a Corpus of
Corrected Human Translations. In: Handbook of the LREC 2002 Workshop
"Machine Translation Evaluation: Human Evaluators Meet Automated
Metrics", Las Palmas de Gran Canaria, Spain, p.17-21.
(the link is to the entire workbook of the workshop)

The paper gives preliminary guidelines for the annotation of student
translation mistakes (based on teacher's annotations on the paper
version), and suggests some uses for such a corpus. Part of the data has
already been used in an exercise that compared human and automated
evaluation metrics. The exercise was organized at an LREC 2002 workshop,
and is summarized in :

Popescu-Belis A. (2003) - An experiment in comparative evaluation:
humans vs. computers. In: Proc. of Machine Translation Summit IX, New
Orleans, LA, USA, p.307-314.

You could also look at the eCoLoRe and Mellanges European projects and
try to find out whether they have more precise guidelines for making a
student translation corpus, which is one of their objectives:

Best regards,
Andrei Popescu-Belis
ISSCO/TIM/ETI, UniversitÚ de Gene`ve
tÚl: +41 (0)22 379 86 81 40, bd. du Pont d'Arve
fax: +41 (0)22 379 86 89 1211 Gene`ve 4 - Suisse

More information about the Corpora-archive mailing list