[Corpora-List] Building a corpus from Twitter & Tw's privacy concerns

John D. Burger john at mitre.org
Wed Jul 17 16:44:24 CEST 2013

On Jul 16, 2013, at 22:49 , Angus Grieve-Smith wrote:

> On 7/16/2013 11:44 AM, John D. Burger wrote:
>> There appears to be no legal reason you can't collect a corpus of tweets. However, per Twitter's Terms of Use you cannot redistribute the tweets to others.
> At first I thought, "that's nuts." Then I thought, "well, if you consider tweets to be creative works like books and songs, it makes a certain sense." Then I concluded that that just shows how nuts our intellectual property system has become. And of course that nobody cares what a bunch of linguists think about our intellectual property system.

I Am Not A Lawyer, but as I understand it, it has nothing to do with copyright. In general tweets are not copyrightable, per several recent (US) court cases. It is simply that the developer Terms dictate that content may not be redistributed directly:

> If you provide downloadable datasets of Twitter Content or an API that returns
> Twitter Content, you may only return IDs (including tweet IDs and user IDs).


Are these Terms legally binding? Dunno, but Twitter thinks so, and so do our lawyers. Could you obviate them by collecting without using the API? Maybe, although it'd be hard to get certain kinds of unbiased random samples via screen scraping.

- John Burger


> BTW, did everyone catch list member Patrick Juola in the news for helping identify J.K. Rowling as the author of The Cuckoo's Calling?
> http://www.post-gazette.com/stories/news/education/duquesne-prof-helps-id-rowling-as-author-695629/
> --
> -Angus B. Grieve-Smith
> grvsmth at panix.com
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora

More information about the Corpora mailing list