[Corpora-List] SMT models trained on EUROPARL

Joerg Tiedemann tiedeman at let.rug.nl
Thu Dec 16 13:20:03 CET 2004



for people interested in MT and alignment:

models for statistical machine translation trained with GIZA++ and the
EUROPARL corpus are now available from the OPUS homepage:

http://logos.uio.no/cgi-bin/opus/viewcvs.cgi/opus/EUROPARL/wordalign/

I used the standard settings of GIZA++ for producing IBM model 4. so
far you can find the models of all languages aligned to Dutch (in both
directions). models for other language pairs will be made available as
soon as the training is finished.

there are also files with the complete list of token links and type links
produced from the intersection of source-to-target and target-to-source
Viterbi alignments. token links are in XML in the files called
SRCTRG.inter.gz and type links are in files called SRCTRG.dic.gz (with SRC
and TRG replaced by the actual language code). everything is encoded in
unicode utf8.

please let me know if this is useful for you. would be nice to know if
this is not only a waste of hardisk space.

best regards,


Jörg

***********/\/\/\/\/\/\/\/\/\/\/\************************************
** Jörg Tiedemann tiedeman at let.rug.nl **
** Alfa-Informatica http://www.let.rug.nl/~tiedeman **
** Rijksuniversiteit Groningen Harmoniegebouw, room 1311-429 **
** Oude Kijk in 't Jatstraat 26 phone: +31 (0)50-363 5935 **
** 9712 EK Groningen fax: +31 (0)50-363 6855 **
*************************************/\/\/\/\/\/\/\/\/\/\/\**********







More information about the Corpora-archive mailing list