-----Original Message----- From: corpora-bounces at uib.no [mailto:corpora-bounces at uib.no] On Behalf Of Emmanuel Prochasson Sent: 09 March 2010 09:36 To: corpora at uib.no Subject: Re: [Corpora-List] Translation evaluation using word alignment
On 03/09/2010 05:06 PM, Alberto Sim§es wrote:
> Dear Emmanuel
> Probably not good enough for your needs, but my experiment with NATools
> was, after obtaining a decent probabilistic translation dictionary
> (using any kind of parallel corpora you can find) use that probabilities
> to measure the likeliness of two sentences being parallel.
> How did I measure it... searching for each word on the S(ource)
> L(anguage) and checking if a translation is present in the T(arget)
> L(anguage), and geting the average of the probabilities. Then, same
> approach from TL to SL.
> Not fancy, but gave some interesting results.
I actually use a similar approach to find some good candidates (but I need to filter them). Instead of using a probabilistic dictionary computed from a parallel corpus, I use a regular lexicon.
The results are interesting, but typically, it won't be able to see a difference between "Jon appeared on TV" and "TV appeared on Jon" (and any translation, say, for example in French: "Jon est passÚ Ó la TV").
Both sentence will perfectly match the French translation. I need to go a bit deeper than lexicon level.
In the first case, I wish to obtain something like : Jon/Jon est passÚ/appeared Ó la/on TV/TV => 100% match in the second case: Jon/NULL est passÚ/appeared Ó la/on TV/NULL => 50% match
(I'm aware than in such a case, any alignment algorithm is likely to be confused, but this is just an illustration).
_______________________________________________ Corpora mailing list Corpora at uib.no http://mailman.uib.no/listinfo/corpora