[Corpora-List] Texts 1900-1970

Chris Butler csblists at telefonica.net
Thu Dec 15 09:26:00 CET 2005

My thanks to the following people, who all provided information on the
availability of texts: Wendy Anderson, Carmela Chateau, Constantin Orasan,
Raf Salkie, Dirk Siepmann, Pedro Ureņa, Romain Vanoudheusden. The sources
which were suggested are as follows:

There are old (and some recent) texts at the project Gutenberg.

the public library of science has open access texts.

A selection of online math text books

the Intratext digital library (contains many religious texts, as well as a
lot of literature)

The SCOTS Corpus (which is freely accessible and searchable at
www.scottishcorpus.ac.uk) contains texts in Scottish English (as well as
dialects of Scots), from 1940 to the present day.

The New York Times Archive
(http://pqasb.pqarchiver.com/nytimes/advancedsearch.html) goes back to 19th

The collection of texts hosted by archive.org
(http://www.archive.org/details/texts) includes texts from the Gutenberg

The Victorian Literary Studies archive at
http://victorian.lang.nagoya-u.ac.jp/index.html, which has a list of authors
at http://victorian.lang.nagoya-u.ac.jp/concordance.html

The archive at www.questia.com


I'd also like to mention the Corpus of Late Modern English Texts compiled by
Hendrik de Smet at the Catholic University of Leuven
(http://perswww.kuleuven.be/~u0044428/), a principled collection of texts
(10 million words, 1720-1920) drawn from archives such as Project Gutenberg
and the Oxford Text Archive. A username and password must be obtained from
Hendrik (Hendrik.desmets at arts.kuleuven.be) in order to access the corpus.

Chris Butler
Honorary Professor, University of Wales Swansea, UK

More information about the Corpora-archive mailing list