[Corpora-List] Handling a Large Text Archive

Adam Kilgarriff adam at lexmasterclass.com
Thu Jan 5 08:46:29 CET 2012


We are now making available an open-source version of Sketch Engine - with all the concordancing, word-list, etc functionality but without the word sketches, so it's called NoSketch Engine. (I still have to feed my children.) It will handle corpora irrespective of size (tested up to 70 billion)

See http://nlp.fi.muni.cz/trac/noske

Adam

On 5 January 2012 07:07, Laurence Anthony <anthony0122 at gmail.com> wrote:


> >ps: For lexical coverage studies, RANGE seems to handle bigger corporabetter.
>
> Although lexical coverage software is not really the focus here, a
> multiplatform (Windows, OS X, Linux) alternative to Range is
> AntWordProfiler, which is of course, freeware:
> http://www.antlab.sci.waseda.ac.jp/software.html
>
> This can handle BNC-size corpora without any problem.
>
> Laurence.
>
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>
>

-- ======================================== Adam Kilgarriff <http://www.kilgarriff.co.uk/> adam at lexmasterclass.com Director Lexical Computing Ltd<http://www.sketchengine.co.uk/>

Visiting Research Fellow University of Leeds<http://leeds.ac.uk>

*Corpora for all* with the Sketch Engine <http://www.sketchengine.co.uk>

*DANTE: a lexical database for English<http://www.webdante.com>

* ======================================== -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 2735 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20120105/2b30c23a/attachment.txt>



More information about the Corpora mailing list