[Corpora-List] Interlingual Machine Translation Systems (fwd)
Gilles.Serasset at imag.fr
Sun Nov 21 18:26:00 CET 2004
Sorry Serguei, but your mail is based on naive ideas and false
On 21 nov. 04, at 10:53, Sergey Protasov wrote:
> We should define the term "good" of MT systems.
> If we take arbitraty sentences from some very big not specialized
> english corpus and translate it, using expert-man-translator, we have
> about 80-90% correctly translated sentences.
> Let's define this as the best quality of translation.
Which should mean that current measures (BLEU, ORANGE,...) should rank
these as top "systems". Which, apparently, is not the case.
> So "good" translation is about 45-50% of correct sentences.
THIS is naive, 100% of incorrect, but "understandable" sentences is
better than 50% of totally unintelligible sentences (especially if it
is the 50% sentences that are more than 7 or 8 words long...).
Moreover, this does not take into account the purpose of the system.
For example, SYSTRAN will be considered as a very bad system for the
translation of meteorological bulletin, where METEO will be considered
VERY GOOD (with your definition...). However, METEO will never be
considered as a good system for wide coverage application, where
SYSTRAN will be considered good.
Also, we should distinguish usage, coverage, quality and potential (the
amount of effort that is needed to raise one of the criteria).
> I think, Systran and any other MT system can translate correctly not
> more than one percent of sentences, arbitrary selected from big
Well, even if it was the case (which I doubt if such evaluation is done
on a fair basis), SYSTRAN will still be useful. The proof being that,
well, it IS used by many.
> This is not "good" in any case, IMHO.
Well, 2 months ago, I was going to Japan and wanted to know the
directions to Okayama University. The "how to get there" was only
available in Japanese... Hence, I asked Systran to translate it into
english. I'm sure that the english was bad, but well, I don't read
Japanese, and English is not my mother tongue, but still, I managed to
get where I wanted to go.
This is not "bad" in any case, IMHO.
If you want to have a look at Russian
Finally, if you are speaking about statistical MT, forget what I said,
as I don't know ANY statistical MT system that is used daily.
GETA-CLIPS-IMAG (UJF, INPG & CNRS)
BP 53 - F-38041 Grenoble Cedex 9
Phone: +33 4 76 51 43 80
Fax: +33 4 76 44 66 75
More information about the Corpora-archive