[Corpora-List] comparable corpora and computer-aided translation

Dom Widdows widdows at google.com
Mon Nov 16 15:41:52 CET 2009


Dear Xiaotian,

One paper on finding translations without parallel corpora is: Learning Bilingual Lexicons from Monolingual Corpora Aria Haghighi, Percy Liang, Taylor Berg-Kirkpatrick and Dan Klein, ACL 2008 http://www.eecs.berkeley.edu/~aria42/pubs/acl2008-unsup-bilexicon.pdf

In general I think there has been a lot of good work that uses language models for the target language built from large monolingual corpora. E.g., you can use a smaller parallel French-English corpus to translate into English, and a large English-only corpus to help "clean up" your translation to make sure your English translation is "reasonable English", as such. At least, that's my cartoon view of the general idea, I'm sure there are many experts out there who can enrich or correct this summary.

Best wishes, Dominic

On Sun, Nov 15, 2009 at 5:11 PM, Xiaotian Guo <garlickfred at gmail.com> wrote:
> Dear Corpora Colleagues
>
> The use of bilingual parallel corpora in computer-aided translation (CAT)
> has been widely acknowledged and applied now. I just wonder whether there
> has been substantial progress or achievement in using comparable corpora in
> CAT. I am aware of Belinda Maia's article "Some Languages are more Equal
> than Others: training translators in terminology and information retrieval
> using comparable and parallel corpora" in Corpora in Translator Education,
> 2003. Is there any other literature on this topic?
>
> If you have any ideas of how comparable corpora can be used in CAT (not
> necessarily mature), please share them with me.
>
> All the best
>
> Xiaotian Guo
>
> SOAS & New Vision Language Centre
>
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>
>



More information about the Corpora mailing list