[Corpora-List] Fwd: Arabic Corpus work in Python

Lisa Hesterberg lisahesterberg2013 at u.northwestern.edu
Mon Oct 12 20:21:10 CEST 2009


---------- Forwarded message ---------- From: Majdi Sawalha <maj_sawalha at yahoo.com> Date: Mon, Oct 12, 2009 at 11:40 AM Subject: Re: [Corpora-List] Arabic Corpus work in Python To: Lisa Hesterberg <lisahesterberg2013 at u.northwestern.edu>

Hi lisa,

i would suggest to use unicode utf-8 for input and output Arabic text in python. there is a utf-8 copy of the CCA Arabic corpus which u can use. if you mean writing Arabic words inside the code in IDLE, this might not work, and if it is work on one machine, it might cause problem on other machines that do not support Arabic characters. so, the best way is to use a string of unicode characters instead. e.g Alif is equivelant to u"\u0627". Arabic letters starts from u0621 to u0652 including short vowels.

i hope this will help,

Majdi

------------------------------ Majdi Sawalha Faculty of Engineering School of Computing University of Leeds Leeds, LS2 9JT UK http://www.comp.leeds.ac.uk/sawalha ------------------------------

------------------------------ *From:* Lisa Hesterberg <lisahesterberg2013 at u.northwestern.edu> *To:* CORPORA at uib.no *Sent:* Mon, October 12, 2009 4:49:49 PM *Subject:* [Corpora-List] Arabic Corpus work in Python

Hi,

I'm currently working with Python on the CCA Arabic corpus, and IDLE is giving me problems with the Arabic characters. Does anyone have any experience working with Arabic in IDLE, or is there a better way to deal with Arabic characters in Python? I would very much appreciate any help on this matter.

Thanks,

Lisa Hesterberg Department of Linguistics Northwestern University -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 3146 bytes Desc: not available URL: <http://www.uib.no/mailman/public/corpora/attachments/20091012/9cd76955/attachment.txt>



More information about the Corpora mailing list