[Corpora-List] Man bites dog

Jimmy O'Regan joregan at gmail.com
Mon Nov 21 13:01:59 CET 2011


On 21 November 2011 03:15, Mike Maxwell <maxwell at umiacs.umd.edu> wrote:
> In LILT 6 (http://elanguage.net/journals/index.php/lilt/issue/current),
> "Zipf's Law and l'Arbitraire du Signe," Martin Kay discusses statistical MT,
> and says (p.22):
>
>   Notice that a language model would, and should, guarantee
>   that the French “homme mord chien” would be translated into
>   English as “dog bites man”, rather than “man bites dog”,
>   which is what it really means.
>
> I once proposed this exact example (with Spanish rather than French) to a
> computational linguist who knew more about MT than I do.  (People who know
> more about MT than I do are quite common.  Ok, they're quite common among
> computational linguists :-).)  That person suggested I needed to learn more
> about MT.
>
> It would be nice to find myself making the same mistake that Martin Kay
> made.  It would be even nicer if it weren't a mistake.
>
> Is Kay's claim correct?  The context is of course pure statistical MT, not
> hybrid rule/ statistical systems.  Assume that the pair "homme mord chien"/
> "man bites dog" never occurs in the training data, but that the reverse does
> (or at least that "dog bites man" appears on the English side, presumably
> with some significant frequency).

That idea overlooks how statistical reordering works, and assumes a 'bag of words' based method; it also presumes that the bigrams 'man bites' and 'bites dog' never occur. More importantly, it assumes that 'dog bites man' is a more frequent trigram in English (i.e., the target language model), which doesn't seem to be true (http://books.google.com/ngrams/graph?content=man+bites+dog%2C+dog+bites+man&year_start=1800&year_end=2000&corpus=0&smoothing=3): which makes sense in hindsight, when you consider the idiomatic value of 'man bites dog'.

It has a sort of metaphorical truth, regarding SMT's difficulties with novelty, but it's not literally true - file it away with 'the meat is rotten, but the vodka is good' :).

-- <Sefam> Are any of the mentors around? <jimregan> yes, they're the ones trolling you



More information about the Corpora mailing list