[Corpora-List] Community-driven corpus building

Trevor Jenkins trevor.jenkins at suneidesis.com
Sat Apr 16 18:21:27 CEST 2011

On Sat, 16 Apr 2011, Martin Reynaert <reynaert at uvt.nl> wrote:

> Shareability implies IPR-settlement. The implicit consent to
> redistribution of their 'donated' texts by users of sites where the
> maintainers have the necessary statements in their sites' terms of use
> allow the corpus builder to deal with a single instance rather than with
> potentially thousand of unreachable individuals.

There is still the very real issue of those people who do not believe in IPR of messages sent to what they personally deem open or public mailing lists even if the terms and conditions of the list itself are no re-use or re-posting then copying the messages over to some separate and independent system where those T&Cs are unenforcable. The maintainer of gmane.org considers a list to be open or public if he can subscribe his scraper account to it thereafter the content is fair game for public redistribution. He will accept a request from any one to scrape a list without contacting the list owners/operators first. The worst sort of opt-out list around.

Regards, Trevor

