[Corpora-List] Re: Google Books, copyrights, and corpora

Pˇter Halacsy peter at halacsy.com
Fri Jun 16 21:47:00 CEST 2006

> What are the implications of this for corpus creation and use?

> If Google wins, does it mean that we can include *ANY* texts in a corpus,

> as long as the end user only has access to short KWIC entries
> (especially if the search interface prevents them from "chaining"
> these together to re-create larger strings of text)?

We've created a parallel corpus of English-Hungarian bitexts and
published on the web after shuffling the texts:

"Some raw materials used for the Hunglish corpus are under copyright
(literature, film subtitles, magazines). We prevented the illegal use of
copyrighted material by shuffling the texts at sentence level. This form
is still useful for research purposes, while it does not infringe upon
the rightholders' interests. If you are a copyright holder, and you
consider the shuffled files infringing, please send email and we will
remove the material in question from the corpus.

The Hunglish corpus is open for use (with the above restrictions) under
a creative commons attributions licence."


More information about the Corpora-archive mailing list