[Corpora-List] corpus software

Stefan Evert stefanML at collocations.de
Fri Apr 23 21:30:31 CEST 2010

Dear corpora subscribers,

I'd like to use this opportunity to promote our public beta testing programme for the Open Corpus Workbench (CWB).

> In Menota (as in all corpora I have been involved in the development
> of or,) the Corpus Linguist Workbench (CLW/CQP) from Univ. of
> Stuttgart is the standard choice of corpus search system. However,
> CLW/CQP is old and has only been maintained and not developed the
> last 10 years( I know ab out the open corpus workbench initative)

That's not quite true, even though progress has admittedly been slow and sporadic, and the official release of version 3.0 is more than 10 years late by now ... :-}

However, many bug fixes and new features have been added to the CWB during this time, and since 2008 there are 64-bit versions for Linux and Mac OS X that can handle corpora of up to 2 billion tokens.

> For example the unicode support is meager.

We[1] are currently working on two new versions of the CWB, even though 3.0 has not _quite_ been released yet:

v3.1 -- native Windows port based on work by the Textometrie project

v3.2 -- full Unicode (UTF-8) support

Version 3.1 is ready for public beta testing, so we would like to ask any CWB users who are interested in the Windows platform (or have some time to spare and access to a Windows machine) to play around with it and discover all the bugs we haven't found yet. Version 3.2 will follow soon (possibly in a less mature alpha release, so that we can test each new feature as it's added).

If you're interested in becoming a beta tester for the CWB, follow the instructions on this page:


Best regards, and thanks in advanced for helping us! Stefan Evert & Andrew Hardie

[1] That is, Andrew Hardie is doing all the hard work, while I'm playing supervisor and giving instructions. :-)

More information about the Corpora mailing list