[Corpora-List] machine translation

Mohammad Sadegh Rasooli rasooli.ms at gmail.com
Tue Dec 18 17:32:50 CET 2012


Thanks Amin, As I know about SCICT corpus, it is a big corpus of collections of classic novels that the project has been finished in summer. I don't think the corpus is completely available but if you live in Iran I think it's easy to obtain that dataset. I think you should contact with people in charge in SCICT. Best On Tue, Dec 18, 2012 at 11:12 AM, amin farajian <ma.farajian at gmail.com>wrote:


> Dear Karine,
> the corpus that you talked about (in Payame Noor University of Yazd) is
> actually the one which is available in ELRA. There is also another
> parallel corpus entitled PEN, developed by myself. It is not still publicly
> available, but I'm going to publish it. In the following paper you can find
> some information about it:
> Mohammad Amin Farajian (2011). PEN: Parallel English-Persian News Corpus<http://world-comp.org/p2011/ICA4953.pdf>.
> Proceedings of 2011 International Conference on Artificial Intelligence
> (ICAI'11), Nevada, USA.
>
> There are some other researchers (Dr. khadivi in Amirkabir University, Dr.
> Faili in University of Tehran, Dr. Analoui in Iran University of Science
> and Technology) and research centers (ITRC and SCICT) in Iran which are
> working on SMT and are building some parallel corpora, but as I know their
> corpora are not available yet.
>
> Best regards,
> Amin
>
> On 12/18/2012 03:33 PM, Megerdoomian, Karine wrote:
>
> I haven’t seen any other parallel English-Persian corpora besides the
> ones already mentioned below. However, I have heard about a corpus being
> developed by the English department at Payame Noor University in Yazd,
> Iran. You may want to contact them. Here’s the info online:
> http://www.eurac.edu/it/newsevents/focus/Newsdetails.html?entryid=22181***
> *
>
> ** **
>
> “Our developmental English-Persian parallel corpus consists of about *three
> million words* (more than 50,000 corresponding sentences in two
> languages). This is a kind of ongoing corpus, that is, an open corpus in
> which more material can be added as the need arises.”****
>
> ** **
>
> Karine****
>
> ** **
>
> ** **
>
> *From:* corpora-bounces at uib.no [mailto:corpora-bounces at uib.no<corpora-bounces at uib.no>]
> *On Behalf Of *Hieu Hoang
> *Sent:* Tuesday, December 18, 2012 7:31 AM
> *To:* Khamesi Fahime
> *Cc:* corpora at uib.no
> *Subject:* Re: [Corpora-List] machine translation****
>
> ** **
>
> Hi Khamesi
>
> According to this website
> http://opus.lingfil.uu.se/
> There are 3 freely available parallel corpora for persian-english:
> TEP
> KDE
> OpenSubtitles
>
> I've noticed other people, especially in Tehran, are also working on MT
> and collect data, eg.
> http://ece.ut.ac.ir/iis/resources.html
>
> Kind Regards
> Hieu
>
> ****
>
> On 12 December 2012 21:15, Khamesi Fahime <khamesi_fahime at yahoo.com>
> wrote:****
>
> Hi,
> I am student of Linguistics in Iran and i am working on English to Persian
> statistical machine translation .****
>
> unfortunately I haven't found any EN-PER corpus except TEP and ELRA .****
>
> There are many restrictions in Iran(boycott) for ordering ELRA .
> I appreciate if u can help me in this respect.****
>
> I am looking forward to your reply.****
>
> Best regards,****
>
> Khamesi****
>
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora****
>
> ** **
>
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing listCorpora at uib.nohttp://mailman.uib.no/listinfo/corpora
>
>
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>
>

-- Mohammad Sadegh Rasooli PhD Student, Computer Science Department, Columbia University Research Assistant, Center for Computational Learning Systems, Columbia University -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 9872 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20121218/c6072a52/attachment.txt>



More information about the Corpora mailing list