[Corpora-List] Constitution

Detmar Meurers dm at ling.ohio-state.edu
Sun May 15 21:00:00 CEST 2005


Hi Jean,

Anyway, it occurred to me that now that an aligned version exists
(it was announced on this list the other day :
http://logos.uio.no/opus), an interesting application would be to
develop programs for the (semi?) automatic verification of
translations! Has anybody done this before?

One can see this as an instance of the task of detecting variation
in corpus annotation. The variation n-gram approach for detecting
inconsistencies/errors in corpus annotation that Markus Dickinson
and I have worked on (cf. references below) should be able to do
this task for aligned parallel corpora (we included it in a recent
project proposal) - it'll be interesting to see what equivalence
classes of nuclei and contexts work best for this task.

Best,
Detmar


Markus Dickinson & Detmar Meurers (2005): `Detecting Errors in
Discontinuous Structural Annotation'. Proceedings of the 43rd
Annual Meeting of the Association for Computational Linguistics
(ACL-05). Ann Arbor, Michigan.

Markus Dickinson & Detmar Meurers (2005): `Detecting Annotation
Errors in Spoken Language Corpora'. Proceedings of the Special
session on treebanks for spoken language and discourse at the 15th
Nordic Conference of Computational Linguistics (NODALIDA-05).
Joensuu, Finland.

Markus Dickinson & Detmar Meurers (2003): `Detecting Inconsistencies
in Treebanks'. Proceedings of the Second Workshop on Treebanks and
Linguistic Theories (TLT 2003). Vxj, Sweden.

Markus Dickinson & Detmar Meurers (2003): `Detecting Errors in
Part-of-Speech Annotation'. Proceedings of the 10th Conference of
the European Chapter of the Association for Computational
Linguistics (EACL-03). Budapest, Hungary.

Available from http://ling.osu.edu/~dm/papers.html


--
Detmar Meurers, Assistant Professor, Dept. of Linguistics, OSU
201a Oxley Hall, 1712 Neil Avenue, Columbus OH 43210-1298, USA
http://ling.osu.edu/~dm/ GnuPG key on web page






More information about the Corpora-archive mailing list