[Corpora-List] Google Books, copyrights, and corpora

Doug Cooper doug at th.net
Thu Jun 15 05:43:01 CEST 2006

Gosh ... am I the only reader who thinks that the AAP intentionally
lays out its case:


in a manner that makes it irrelevant to the use of texts in research
corpora? After all, the cornerstone of AAP's argument, repeated in
almost every paragraph of the complaint, is that Google's goal is
strictly commercial, even to the point of using the scanned copies
to "pay" for borrowing the originals (paragraph 6).

Although this argument is clearly meant to undermine a fair-use
defense by Google, it's pretty hard to see it being applied to corpus
research. One could claim that simply buying and copying a complete
text is inherently infringing, but it ain't -- no more than buying,
reading, and then going out and _reselling_ a text is (see
http://en.wikipedia.org/wiki/Bobbs-Merrill_Co_v._Straus ). The
actual use of the copyrighted material has to infringe as well,
and the fair use guidelines are intentionally written in a manner
that ensures that this case must be made.

The parallel to Napster is also hard to see. Taking a work apart,
then providing an automatic process to put it back together again,
clearly tries to make an end run around the law. But quite simple
limitations on corpus sample-serving (e.g. not allowing samples to
run over paragraph boundaries, and/or not identifiying samples with
their specific sources) would make it impossible for any number of
14-year-old Python scripters to reconstitute the original texts.

Bottom line, establishing that research applications of text corpora
is fair use is not a matter of 'snippet' defenses, and won't rise or fall
with Google. Rather, it's that our use and citation of text samples for
analytical purposes has little or nothing to do with the protection they
are given as creative literary works. And, as I've said before, I think
it's incumbent on us as a research community to help make this clear.

Be well,
Doug Cooper
http://sealang.net http://crcl.th.net
CRCL Inc is a US 501(c)3 nonprofit organization

More information about the Corpora-archive mailing list