[Corpora-List] machine translation

amin farajian ma.farajian at gmail.com
Tue Dec 18 18:14:07 CET 2012


Dear Mohammad Sadegh,

Good news about SCICT corpus. It took along time, but I hope the resulting corpus was fine. now I am doing my PhD in FBK-IRST, Italy, so I am not in Iran and I don't have access to the people in SCICT. Is there any other way for obtaining this corpus? As I know Ms Khamesi is in Bojnourd, Iran. So, if possible, please provide her the information that she needs for contacting with SCICT people and getting this corpus.

Best regards, Amin

On Tue, Dec 18, 2012 at 5:32 PM, Mohammad Sadegh Rasooli < rasooli.ms at gmail.com> wrote:


> Thanks Amin,
> As I know about SCICT corpus, it is a big corpus of collections of classic
> novels that the project has been finished in summer. I don't think the
> corpus is completely available but if you live in Iran I think it's easy to
> obtain that dataset. I think you should contact with people in charge in
> SCICT.
> Best
>
> On Tue, Dec 18, 2012 at 11:12 AM, amin farajian <ma.farajian at gmail.com>wrote:
>
>> Dear Karine,
>> the corpus that you talked about (in Payame Noor University of Yazd) is
>> actually the one which is available in ELRA. There is also another
>> parallel corpus entitled PEN, developed by myself. It is not still publicly
>> available, but I'm going to publish it. In the following paper you can find
>> some information about it:
>> Mohammad Amin Farajian (2011). PEN: Parallel English-Persian News Corpus<http://world-comp.org/p2011/ICA4953.pdf>.
>> Proceedings of 2011 International Conference on Artificial Intelligence
>> (ICAI'11), Nevada, USA.
>>
>> There are some other researchers (Dr. khadivi in Amirkabir University,
>> Dr. Faili in University of Tehran, Dr. Analoui in Iran University of
>> Science and Technology) and research centers (ITRC and SCICT) in Iran
>> which are working on SMT and are building some parallel corpora, but as I
>> know their corpora are not available yet.
>>
>> Best regards,
>> Amin
>>
>> On 12/18/2012 03:33 PM, Megerdoomian, Karine wrote:
>>
>> I haven’t seen any other parallel English-Persian corpora besides the
>> ones already mentioned below. However, I have heard about a corpus being
>> developed by the English department at Payame Noor University in Yazd,
>> Iran. You may want to contact them. Here’s the info online:
>> http://www.eurac.edu/it/newsevents/focus/Newsdetails.html?entryid=22181**
>> **
>>
>> ** **
>>
>> “Our developmental English-Persian parallel corpus consists of about *three
>> million words* (more than 50,000 corresponding sentences in two
>> languages). This is a kind of ongoing corpus, that is, an open corpus in
>> which more material can be added as the need arises.”****
>>
>> ** **
>>
>> Karine****
>>
>> ** **
>>
>> ** **
>>
>> *From:* corpora-bounces at uib.no [mailto:corpora-bounces at uib.no<corpora-bounces at uib.no>]
>> *On Behalf Of *Hieu Hoang
>> *Sent:* Tuesday, December 18, 2012 7:31 AM
>> *To:* Khamesi Fahime
>> *Cc:* corpora at uib.no
>> *Subject:* Re: [Corpora-List] machine translation****
>>
>> ** **
>>
>> Hi Khamesi
>>
>> According to this website
>> http://opus.lingfil.uu.se/
>> There are 3 freely available parallel corpora for persian-english:
>> TEP
>> KDE
>> OpenSubtitles
>>
>> I've noticed other people, especially in Tehran, are also working on MT
>> and collect data, eg.
>> http://ece.ut.ac.ir/iis/resources.html
>>
>> Kind Regards
>> Hieu
>>
>> ****
>>
>> On 12 December 2012 21:15, Khamesi Fahime <khamesi_fahime at yahoo.com>
>> wrote:****
>>
>> Hi,
>> I am student of Linguistics in Iran and i am working on English to
>> Persian statistical machine translation .****
>>
>> unfortunately I haven't found any EN-PER corpus except TEP and ELRA .***
>> *
>>
>> There are many restrictions in Iran(boycott) for ordering ELRA .
>> I appreciate if u can help me in this respect.****
>>
>> I am looking forward to your reply.****
>>
>> Best regards,****
>>
>> Khamesi****
>>
>>
>> _______________________________________________
>> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
>> Corpora mailing list
>> Corpora at uib.no
>> http://mailman.uib.no/listinfo/corpora****
>>
>> ** **
>>
>>
>> _______________________________________________
>> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
>> Corpora mailing listCorpora at uib.nohttp://mailman.uib.no/listinfo/corpora
>>
>>
>>
>> _______________________________________________
>> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
>> Corpora mailing list
>> Corpora at uib.no
>> http://mailman.uib.no/listinfo/corpora
>>
>>
>
>
> --
> Mohammad Sadegh Rasooli
> PhD Student, Computer Science Department, Columbia University
> Research Assistant, Center for Computational Learning Systems, Columbia
> University
>
-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 10792 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20121218/e4c2c488/attachment.txt>



More information about the Corpora mailing list