[Corpora-List] Google's translations

Jimmy O'Regan joregan at gmail.com
Mon Mar 15 00:37:17 CET 2010


On 11 March 2010 13:18, Peter Kolb <pekoli at gmail.com> wrote:
> 3. Another interesting experiment is to let Google translate the German word
> "Ufer" (meaning "bank", but only in the waterside sense) into Czech. This
> gives "banky", which means "bank", but only in its financial sense. This can
> be explained by the observation that Google always uses English as
> interlingua (Ufer --> bank --> banky). If you directly translate e.g.
> Spanish to French you will get exactly the same result as when you first
> translate Spanish into English, and then translate the English output into
> French.
> Obviously, even for Google it is too costly to generate and maintain 52 * 51
> = 2651 translation models for all the supported language pairs. Or is it
> that they have found that X to English to Y always performs better than X to
> Y because there is so much more data available between English and X or Y
> than between X and Y?

Improving Word Alignment with Bridge Languages, Shankar Kumar, Franz Och, Wolfgang Macherey, Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, 2007. http://www.aclweb.org/anthology-new/D/D07/D07-1005.pdf

' We show that parallel corpora in multiple lan- guages can be exploited to improve the translation performance of a phrase-based translation system. This paper gives specific recipes for using a bridge language to construct a word alignment and for com- bining word alignments produced by multiple statis- tical alignment models.'



More information about the Corpora mailing list