[Corpora-List] Community-driven corpus building

Trevor Jenkins trevor.jenkins at suneidesis.com
Fri Apr 15 23:10:43 CEST 2011

On Fri, 15 Apr 2011, Martin Reynaert <reynaert at uvt.nl> wrote:

> Trevor Jenkins wrote:
> > On Thu, 14 Apr 2011, Martin Reynaert <reynaert at uvt.nl> wrote:
> >
> >
> >> What Stefan defines here appears to me to be a killer application for
> >> corpus building.
> >>
> >
> > Jumps up and down excitedly ... then realises he'd forget the existence of
> > such a button (and the app behind) after a few days.
> Of course, we should design our plug-in properly, facilitating people
> wont to forget about it ;0) ...

I mung headers. What has that to do with the price of fish I hear you ask. Simply I do that to make sure that all my replies go back to the list. What has that too have to do with the price of fish I hear you ask. Simple it reduces the cognitive load when I'm replying. I know that as a result of my munging any mesasge I write will go to the list (and only to the list). And then I *only* have to consider the few instances where I would need to reply off list.

The plug-in being suggested is similar in that many of the documents I type using a word processor (and I use several different ones, OpenOffice.org, Apple Pages, LyX, Scribus depending upon the audience of the text) quite often contain data covered by national data protection legisation. (And for the US members the European laws are much stricter.) None of those documents should ever be commited to a public archive. So we're back to the cognitive load problem. Either I have to remember to invoke the application for documents that can be public or I have to remember not to invoke for those that can't.

> ... So after installation and initialisation, it should be 'on' by
> default, with the optimal (from the corpus-building point of view)
> settings.

Hell no! No default for an application should never assume opt-out.

> >> ... donating this very text ...
> >>
> >
> > You have already ``donated'' this text ... to the list's archive.
> Under the terms I build my Dutch corpus, this 'donation' is only in part
> so. ...

Except that you have actually donated your on-list replies to a collection that is not under the control of the list owners. It can be scrapped by anyone with a mind to. This list (and many others) is being mirrored on gmane.org.

> Implicitly I may have given consent, however. That is if the maintainers
> of the Corpora List explicitly state somewhere in its 'terms of use'
> that any posting implies that the poster passes on his own copyright to
> the site, or (better still) that any posting will be under Creative
> Commons Licence such and so (allowing redistribution, preferably
> allowing remixing). As regards this List, I do not know. ...

I know because I've had heated discussions with the owner and operator of gmane.org that he considers any ``open'' mailing list (that is one to which anyone can subscribe even if the official archives are themselves passwrod protected to subscribers) as fair game for gmane.

Regards, Trevor

<>< Re: deemed!

More information about the Corpora mailing list