[Corpora-List] Handling a Large Text Archive

Hardie, Andrew a.hardie at lancaster.ac.uk
Wed Jan 4 17:26:49 CET 2012


Hi Muhammad,

You don’t need tagged data from CQPweb; it’s quite happy with untagged text as long as it’s in tokenised (one token or XML tag per line) form.

Also, don’t forget that CQPweb sits on top of CWB (Corpus Workbench), which can be used from the command line without setting up the web interface if that’s better for your needs. See http://cwb.sourceforge.net/install.php

and the place to ask questions if you get stuck is http://devel.sslmit.unibo.it/mailman/listinfo/cwb.

best

Andrew Hardie (CQPweb developer)

From: corpora-bounces at uib.no [mailto:corpora-bounces at uib.no] On Behalf Of True Friend Sent: 04 January 2012 15:17 To: Emiliano Guevara; corpora Subject: Re: [Corpora-List] Handling a Large Text Archive

I use C# for writing small script like programs which i use to process data. But writing code even for getting concordance of single word is a bit daunting. So i was looking for something ready made, click and run type. I am familiar with CQPWeb, worked with it while having a research using BE06 corpus. But didn't use it to manage a corpus. Perhaps it'll become complex for me (text archive is not tagged). Well, i'll give it a try. Regards

-- Muhammad Shakir Aziz محمد شاکر عزیز Master in Applied Linguistics Translator, Course Developer, Linguist for Urdu, Punjabi and English Urdu:- http://awaz-e-dost.blogspot.com/ English:- http://linguisticslearner.blogspot.com/ Facebook:- http://www.facebook.com/truefriend2004 Skype:- true_friend2004

-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 7332 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20120104/f8c16d06/attachment.txt>



More information about the Corpora mailing list