[Corpora-List] Texts 1900-1970: one more
smithni at exchange.lancs.ac.uk
Sun Dec 18 00:17:03 CET 2005
Dear Chris, list members,
We are nearing completion of a corpus of printed texts produced in 1931 (+/- 3 years), and have begun compiling a similar corpus of texts produced in 1901 (+/- 3 years).
Both corpora are modelled on the LOB and FLOB corpora of British English, sampling 1961 and 1991 respectively.
We expect to release the 1931 corpus next year, after clearing copyright permissions.
Geoff Leech, Nick Smith, Paul Rayson
> -----Original Message-----
> From: owner-corpora at lists.uib.no [mailto:owner-corpora at lists.uib.no]On
> Behalf Of Chris Butler
> Sent: 15 December 2005 07:55
> To: corpora at hd.uib.no
> Subject: [Corpora-List] Texts 1900-1970
> My thanks to the following people, who all provided information on the
> availability of texts: Wendy Anderson, Carmela Chateau,
> Constantin Orasan,
> Raf Salkie, Dirk Siepmann, Pedro Ureņa, Romain Vanoudheusden.
> The sources
> which were suggested are as follows:
> There are old (and some recent) texts at the project Gutenberg.
> the public library of science has open access texts.
> A selection of online math text books
> the Intratext digital library (contains many religious texts,
> as well as a
> lot of literature)
> The SCOTS Corpus (which is freely accessible and searchable at
> www.scottishcorpus.ac.uk) contains texts in Scottish English
> (as well as
> dialects of Scots), from 1940 to the present day.
> The New York Times Archive
> goes back to 19th
> The collection of texts hosted by archive.org
> (http://www.archive.org/details/texts) includes texts from
> the Gutenberg
> The Victorian Literary Studies archive at
> http://victorian.lang.nagoya-u.ac.jp/index.html, which has a
> list of authors
> at http://victorian.lang.nagoya-u.ac.jp/concordance.html
> The archive at www.questia.com
> I'd also like to mention the Corpus of Late Modern English
> Texts compiled by
> Hendrik de Smet at the Catholic University of Leuven
> (http://perswww.kuleuven.be/~u0044428/), a principled
> collection of texts
> (10 million words, 1720-1920) drawn from archives such as
> Project Gutenberg
> and the Oxford Text Archive. A username and password must be
> obtained from
> Hendrik (Hendrik.desmets at arts.kuleuven.be) in order to access
> the corpus.
> Chris Butler
> Honorary Professor, University of Wales Swansea, UK
More information about the Corpora-archive