[Corpora-List] Google Books, copyrights, and corpora

Doug Cooper doug at th.net
Fri Jun 16 12:34:15 CEST 2006

It's not clear to me where the needless paranoia about US
copyright law, or the comparisons to Napster, are coming from.

First, in regard to understanding the law, the major decisions
-- Sandra Day O'Connor's 'sweat of the brow' ruling in Feist, the
'spark of creativity' ruling in Bridgeman vs. Corel, the 'broad
transformative purpose' ruling in Kelly vs. Arriba Soft, the 'tell the
robot' ruling in Field v. Google, etc. -- are remarkably clear, to the
point, and consistent with past practice: the relevant principles are
explained and then a common-sense conclusion is drawn. Reading
the decisions is also quite helpful for learning to distinguish
between between the claims the plaintiffs make on their websites,
and the points the judges actually consider to be at issue, e.g.:

Feist v Rural Telephone
Bridgeman v Corel
Kelly v Arriba Soft

Field v. Google (note page 16 para (b) in particular vis a vis the Library suit)

Secondly, in regard to comparisons with Napster, an analogously
designed musical corpus might let the researcher specify a note,
and then return that note's immediate 'collocates' as found throughout
the corpus. Samples would be limited in duration, would not necessarily
identify specific songs of origin, and might not cross arbitrary boundaries
(e.g. every nth bar or measure).

A visit to the 'music-corpora-list' archives would confirm bona fide
research applications of such information (e.g. studying compression
technology, human perception of sound, cultural bias in composition,
etc.), and make the transformative purpose -- e.g. you can't dance
to it -- of the corpus clear.

It seems to me that such a music corpus tool could reasonably
make a case as a fair-use application, regardless of any Napster-
related decisions. More to the point, it seems to me that comparing
compilers or users of research text corpora to Napster just doesn't
make any sense -- on the contrary, what we do is much more
analogous to building and/or using Google, which is clearly protected.

Doug Cooper

More information about the Corpora-archive mailing list