(With apologies for cross-posting)
The Visualising English Print (UW-Madison, University of Strathclyde, and the Folger Shakespeare Library) project has just released plain text files of unrestricted TCP corpora with standardized spelling.
You can find out more--such as curation methods for these corpora--at the project website: http://vep.cs.wisc.edu. For those who want to jump straight to the download page, go here: http://graphics.cs.wisc.edu/WP/vep/tcp/.
Over the next few weeks we will be working to further standardize spelling in the corpora, so stay tuned for announcements. Don't hesitate to contact Deidre Stuffer (stuffer at wisc.edu) with questions.
We look forward to seeing what you do with the corpora!
Best wishes, Heather Froehlich, on behalf of the Visualising English Print Team
-- Heather Froehlich
w // http://hfroehli.ch t // @heatherfro -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 1401 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20160603/17b99af3/attachment.txt>