Regards, Sérgio
-----Original Message----- From: corpora-bounces at uib.no [mailto:corpora-bounces at uib.no] On Behalf Of Emmanuel Prochasson Sent: 09 March 2010 09:36 To: corpora at uib.no Subject: Re: [Corpora-List] Translation evaluation using word alignment
On 03/09/2010 05:06 PM, Alberto Simões wrote:
> Dear Emmanuel
>
> Probably not good enough for your needs, but my experiment with NATools
> was, after obtaining a decent probabilistic translation dictionary
> (using any kind of parallel corpora you can find) use that probabilities
> to measure the likeliness of two sentences being parallel.
>
> How did I measure it... searching for each word on the S(ource)
> L(anguage) and checking if a translation is present in the T(arget)
> L(anguage), and geting the average of the probabilities. Then, same
> approach from TL to SL.
>
> Not fancy, but gave some interesting results.
>
I actually use a similar approach to find some good candidates (but I need to filter them). Instead of using a probabilistic dictionary computed from a parallel corpus, I use a regular lexicon.
The results are interesting, but typically, it won't be able to see a difference between "Jon appeared on TV" and "TV appeared on Jon" (and any translation, say, for example in French: "Jon est passé à la TV").
Both sentence will perfectly match the French translation. I need to go a bit deeper than lexicon level.
In the first case, I wish to obtain something like : Jon/Jon est passé/appeared à la/on TV/TV => 100% match in the second case: Jon/NULL est passé/appeared à la/on TV/NULL => 50% match
(I'm aware than in such a case, any alignment algorithm is likely to be confused, but this is just an illustration).
Regards,
-- Emmanuel
_______________________________________________ Corpora mailing list Corpora at uib.no http://mailman.uib.no/listinfo/corpora