[Corpora-List] BAWE corpus now archived and available

jasper holmes jasper.holmes at gmail.com
Mon Oct 6 17:05:26 CEST 2008


A major consideration in limiting access to this resource in particular is that, since it consists of a large collection of high-scoring university assignments, it is particularly vulnerable to abuse.

All assignments have been submitted to Turnitin, but this only offers protection in those institutions who use their service.

A snippet could be released without great risk. Alternatively this corpus has a detailed manual, which could be distributed without restriction.

Jasper

On 10/4/08, Lou's Laptop <lou.burnard at oucs.ox.ac.uk> wrote:
> I note the weasel words "non-commercial use" in the agreement Steven quotes.
> Can't speak for my colleagues in the OTA or in Essex, but my guess is that
> it's that which is making those archives (or more likely their former
> funders) anxious: it means they can bounce requests from Microsoft's
> research department (thus requiring same to apply for copies from their
> personal e-mail addresses). The world would be a simpler and probably better
> place if distributors of such resources just accepted that evil commercial
> people out there making some money out of them might not be such a bad
> thing.
>
> The suggestion about making a snippet available in advance is a good one;
> some revisions have been made to the way OTA texts are displayed on the web,
> and this might be one we could incorporate.
>
> Just my personal opinions!
>
>
> Steven Bird wrote
>
> >
> > On Sat, Oct 4, 2008 at 1:29 AM, jasper holmes <jasper.holmes at gmail.com>
> wrote:
> >
> >
> > > We are pleased to announce that the British Academic Written English
> > > (BAWE) corpus is now available to all researchers ...
> > > There are no restrictions on access to the corpus ...
> > >
> > >
> >
> > Except that the UK Data Archive requires users to fill in a web form,
> > which leads to:
> >
> > "Fax or post a signed copy of this form to: UK Data Archive,
> > University of Essex, Wivenhoe Park, Colchester, Essex, CO4 3SQ Fax:
> > +44 (0) 1206 872003 Upon receipt of the signed form, we will create
> > an Athens account for you within three working days. You will then
> > receive an email and will be able to register with ESDS."
> >
> > The Oxford Text Archive requires users to fill in a web form, which leads
> to:
> >
> > "Thank you for requesting British Academic Written English Corpus.
> > Staff at the Oxford Text Archive need to approve your request before
> > granting you access to this resource."
> >
> > These steps seem like overkill for a corpus which has generous
> > permissions: "Available for non-commercial use on condition that this
> > header is included in its entirety with any copy distributed."
> >
> > It would be helpful if UKDA and OTA didn't impose these extra barriers
> > to access for such corpora. I wonder what criteria they use in
> > approving an application. It would also be helpful if they made a
> > sample of the data available so users could see if a corpus met their
> > needs before going through the application process.
> >
> > -Steven Bird
> >
> > _______________________________________________
> > Corpora mailing list
> > Corpora at uib.no
> > http://mailman.uib.no/listinfo/corpora
> >
> >
>
>



More information about the Corpora mailing list