[Corpora-List] Building a corpus from Twitter & Tw's privacy concerns

Leon Derczynski leon at dcs.shef.ac.uk
Thu Jul 18 11:05:54 CEST 2013

On 18 July 2013 10:33, Miguel Almeida <miguelbalmeida at gmail.com> wrote:

> Adam, Miles,
> I think another reason is so that Twitter can "black out" everyone else at
> any time in the future. It's a great (and very selfish and narrow-minded)
> idea: let the research community publish papers with your data, showing you
> how to find interesting stuff in your data (using taxpayer money!), and
> then if at some point you want to black them out, use the kill switch.
> I don't think Twitter's owners care that much about reproducible research.
> ;)

Mind you, they do seem to be quite lackadaisical when it comes to enforcing their policy - the only two instances of this that I've heard of came after large corpora (millions of documents) were distributed conspicuously for a number of years, and the enforcements didn't involve court fees, suing for damages or anything like that; in fact, the rumour was that they were a fairly low-key affairs. I'm sure list members can tell us if that was not the case.

Leon -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 1706 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20130718/7eabb307/attachment.txt>

More information about the Corpora mailing list