[Corpora-List] Google's translations
joregan at gmail.com
Mon Mar 15 00:37:17 CET 2010
On 11 March 2010 13:18, Peter Kolb <pekoli at gmail.com> wrote:
> 3. Another interesting experiment is to let Google translate the German word
> "Ufer" (meaning "bank", but only in the waterside sense) into Czech. This
> gives "banky", which means "bank", but only in its financial sense. This can
> be explained by the observation that Google always uses English as
> interlingua (Ufer --> bank --> banky). If you directly translate e.g.
> Spanish to French you will get exactly the same result as when you first
> translate Spanish into English, and then translate the English output into
> Obviously, even for Google it is too costly to generate and maintain 52 * 51
> = 2651 translation models for all the supported language pairs. Or is it
> that they have found that X to English to Y always performs better than X to
> Y because there is so much more data available between English and X or Y
> than between X and Y?
Improving Word Alignment with Bridge Languages, Shankar Kumar, Franz
Och, Wolfgang Macherey, Conference on Empirical Methods in Natural
Language Processing and Computational Natural Language Learning, 2007.
' We show that parallel corpora in multiple lan-
guages can be exploited to improve the translation
performance of a phrase-based translation system.
This paper gives specific recipes for using a bridge
language to construct a word alignment and for com-
bining word alignments produced by multiple statis-
tical alignment models.'
More information about the Corpora