[Corpora-List] Handling a Large Text Archive

True Friend true.friend2004 at gmail.com
Wed Jan 4 15:57:07 CET 2012


Hi I've a large text archive of 100+ million words in utf8 encoding (non-English text archive). Sometimes i need to get concordance, or word list but its size creates problem. I've tried AntConc (always hangs when I open the text files in it), as well as TextSTAT (works fine for concordance usually but hangs when a word list task is given). Any good free alternative to handle big text archives? Or any efficient way to handle such a large collection? Thanks a lot for taking time and reading this email. Your response will be highly appreciated. Regards

-- *Muhammad Shakir Aziz* *محمد شاکر عزیز* *Master in Applied Linguistics Translator, Course Developer, Linguist for Urdu, Punjabi and English* Urdu:- http://awaz-e-dost.blogspot.com/ English:- http://linguisticslearner.blogspot.com/ Facebook:- http://www.facebook.com/truefriend2004 Skype:- true_friend2004 -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 2442 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20120104/09dc47b3/attachment.txt>



More information about the Corpora mailing list