[Corpora-List] Covert Arabic PDF to txt for corpus use

Eric Atwell E.S.Atwell at leeds.ac.uk
Fri Apr 10 13:37:17 CEST 2020


Hello Mai,

an alternative solution may be to search the Web (or ask on Forums) for a text transcript of your PDF Arabic book or document. For example https://iqsaweb.wordpress.com/2016/08/22/online-corpora-of-classical-arabic-texts/ has links to several Arabic text sites: http://shamela.ws/ http://www.alwaraq.net/ http://lib.eshia.ir/ https://www.noorlib.ir/ http://shiaonlinelibrary.com/

These are mainly Classical Arabic books, so this approach may not help if you need modern Arabic documents.

I hope you are safe and well

Eric

Eric Atwell, Professor of Artificial Intelligence for Language

PhD tutor, School of Computing, Uni of LEEDS, LS2 9JT, UK

http://www.comp.leeds.ac.uk/eric

http://qurananalysis.com http://corpus.quran.com

________________________________ From: corpora-bounces at uib.no <corpora-bounces at uib.no> on behalf of Mai Zaki <maizaki at gmail.com> Sent: 10 April 2020 11:09 To: corpora at uib.no <corpora at uib.no> Subject: [Corpora-List] Covert Arabic PDF to txt for corpus use

Hi all, Can anyone recommend a reliable and accurate way that can con convert PDF files in Arabic to text files (for Mac) to be used by corpus analysis software. I tried several types of software (the free trial version) with a sample to see if it works but they all failed miserably, I even tried the PDF converter package on R Studio, it worked with one file but not the other, and when it worked it still needed quite some editing. I am willing to pay to get a professional software to do it but I need to be sure that it actually works. Any advice is deeply appreciated. Thanks, Mai Zaki

-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 4574 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20200410/7ac700d2/attachment.txt>



More information about the Corpora mailing list