[Corpora-List] A new English-Persian parallel corpus - extracted from Wikipedia

Hamed Zamani hamedzamani at acm.org
Wed May 4 00:06:24 CEST 2016


Dear all,

We just released a new English-Persian parallel corpus, extracted from Wikipedia articles. This parallel corpus was extracted using a sentence alignment algorithm recently proposed in [1].

This parallel corpus is freely available for research purposes and can be found here: http://ece.ut.ac.ir/en/project/wikipedia-parallel-corpus

The English-Persian probabilistic dictionary extracted from this parallel corpus (using IBM Model 1) can be also found in the above link.

For more detail, please refer to the following article:

[1] H. Zamani, H. Faili, A. Shakery, "Sentence Alignment Using Local and Global Information <http://www.sciencedirect.com/science/article/pii/S0885230816300572>", In Computer Speech & Language (CSL), Volume 39, 2016.

Please don't hesitate to contact us, if have any question regarding this parallel corpus.

Regards, Hamed Zamani -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 1844 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20160503/f425de56/attachment.txt>



More information about the Corpora mailing list