[Corpora-List] old/modern english corpus data??

Adam Kilgarriff adam at lexmasterclass.com
Fri Nov 9 13:56:59 CET 2012


Daer Jungsoo (and others)

we have the Corpus of English Dialogues<http://www.engelska.uu.se/Research/English_Language/Research_Areas/Electronic_Resource_Projects/A_Corpus_of_English_Dialogues/>and Penn Historical Corpora <http://www.ling.upenn.edu/histcorpora/> in the Sketch Engine <http://sketchengine.co.uk>. So, you can comfortably, flexibly search them without having any overhead of setting up.

(We worked with Jonathan Culpeper on the former and Tony Kroch on the latter. For PHC you'll first need to get a licence from Tony Kroch.)

If there are other corpora that people want SkE access to, do let us know, and we'll be happy to add them to resources available in the Sketch Engine

Regards

Adam

On 9 November 2012 10:21, George Walkden <george.walkden at manchester.ac.uk>wrote:


> Dear Jungsoo,
>
> There's also the Parsed Corpus of Early English Correspondence (PCEEC),
> freely available via the Oxford Text Archive:
> http://www-users.york.ac.uk/~lang22/PCEEC-manual/index.htm.
>
> It has 2.2 million words from 1410-1695. A bit earlier than the ones Kat
> mentions, but it has the advantage of being POS-tagged (though not
> lemmatized).
>
> Best,
>
> - George
>
> :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
> George Walkden
> Lecturer in English Linguistics
> University of Manchester
> george.walkden at manchester.ac.uk
> http://personalpages.manchester.ac.uk/staff/george.walkden/
> Office: N1.2 Samuel Alexander Building
> Tel.: +44 (0)161 275 8905
> :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
>
> On 9 Nov 2012, at 01:00, "K Gupta" <k.e.gupta at gmail.com> wrote:
>
> Dear Jungsoo,
>
> You may find the following helpful:
>
> *Corpus of Late Modern English Texts* -
> https://perswww.kuleuven.be/~u0044428/
> It comprises of two sections: the Corpus of Late Modern English Texts
> (CLMET) and the Corpus of Late Modern English Texts Extended Version
> (CLMETEV). Both comprise of texts arranged in the following time periods:
> 1710-1780, 1780-1850, and 1850-1920. The texts are varied in terms of
> genre, ranging from personal letters to literary fiction to scientific
> writing but inevitably has more formal prose.
>
> *Zurich English Newspaper Corpus* -
> http://www.helsinki.fi/varieng/CoRD/corpora/ZEN/index.html
> 349 complete newspaper issues published between 1661 and 1791, and
> contains 1.6 million words
>
> *The Lampeter Corpus of Early Modern English Tracts* -
> http://ota.ox.ac.uk/headers/2400.xml
> Tracts and pamphlets published between 1640 and 1740, organised into the
> categories of religion, politics, economy and trade, science, law and
> miscellaneous. There are 120 different texts, amounting to 1.1 million
> words
>
>
> Best wishes,
> Kat
>
> On 9 November 2012 00:23, Jungsoo Kim <jungsookim0845 at gmail.com> wrote:
>
>> Does anyone know where to find freely available online old-/modern-
>> English corpora, whose data are before 1800 (Googlebooks corpora are not
>> ideal for me)? It would be more than wonderful if they have a search
>> function that enable us to search data based on words, lemma, and parts of
>> speech.
>>
>> I would be really grateful for any sorts of help,
>> Jungsoo
>>
>> _______________________________________________
>> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
>> Corpora mailing list
>> Corpora at uib.no
>> http://mailman.uib.no/listinfo/corpora
>>
>>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>
>

-- ======================================== Adam Kilgarriff <http://www.kilgarriff.co.uk/> adam at lexmasterclass.com Director Lexical Computing Ltd<http://www.sketchengine.co.uk/>

Visiting Research Fellow University of Leeds<http://leeds.ac.uk>

*Corpora for all* with the Sketch Engine <http://www.sketchengine.co.uk>

*DANTE: a lexical database for English<http://www.webdante.com>

* ======================================== -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 7460 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20121109/7ef32b69/attachment.txt>



More information about the Corpora mailing list