[Corpora-List] Arabic-English Probabilistic Lex

amin farajian ma.farajian at gmail.com
Sat Dec 29 20:36:32 CET 2012


Dear Mostafa,

We were working on Persian-Arabic SMT some month ago. One of our ideas was using Arabic-English SMT as a bridge. We found these 2 free parallel Arabic-English corpora: 1. MEDAR Evaluation Package. as i remember, it is a parallel corpus extracted automatically from parallel UN Documents. Since the sentence alignment was automatic, you can find some noise in it. but it is still usable. by the way, it depends on your goal and the level of accuracy that you want. you can find this corpus here: http://catalog.elra.info/product_info.php?products_id=1166, it is free for both academic and commercial usages. and if you send them your request form, they will provide you immediately. you can also send an email directly to Mr. Khalid Choukri (choukri at elda.org). he can help you in getting this corpus and the other corpora they might have.

2. OpenSubtitles. you can find it here: http://opus.lingfil.uu.se/OpenSubtitles2011.php. it is also aligned automatically.

They are some other parallel English-Arabic corpora (such as Xinhua), but they are not free and I think it would be a bit hard for you to buy them from Iran.
>From my last discussion with Dr. Behrang Mohit, I found that he is also
working on English to Arabic SMT. So I think it is worth to send him an email and talk to him directly.

Hope this was useful for you.

Best regards, Amin

On Sat, Dec 29, 2012 at 4:47 PM, Mostafa Dehghani < dehghani.mostafa at gmail.com> wrote:


> Dear Corpus members,
>
> I am looking for a probabilistic Arabic-English (also English-Arabic)
> dictionary that is extracted from parallel or comparable corpus. I cannot
> find any resources for this kind of dictionary. Although it is possible to
> use some tools to extract translations with their associated probabilities
> from parallel corpus, it seems there is no free Arabic-English parallel
> corpus available.
>
> I really appreciate any help you can give me and look forward to your
> responses.
>
> Sincerely,
>
> --Mostafa
> --
> Mostafa Dehghani, M.Sc. Student
> Intelligent Information Systems lab, Software Eng. Group,
> School of Electrical and Computer Eng.(ECE)
> University of Tehran,
> Tehran, Iran
> Tel: +9821-6111-9723
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>
>
-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 4597 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20121229/a265d87d/attachment.txt>



More information about the Corpora mailing list