[Corpora-List] Searching for an email corpus - SUMMARY

Ute Römer ute.roemer at engsem.uni-hannover.de
Wed Apr 11 20:41:00 CEST 2007


Dear All,

Here is a quick summary of the messages I got in response to my recent query
on email corpora. I'd like to thank the following list members for helpful
pointers:
Stefan Bordag
Chris Jordan
Sabine Bartsch
Ramesh Krishnamurthy

Stefan Bordag mentioned the (huge) USENET corpus which does not contain
emails but texts of a similar type (from an internet discussion forum):
<http://www.psych.ualberta.ca/~westburylab/downloads/usenetcorpus.download.h
tml>
http://www.psych.ualberta.ca/~westburylab/downloads/usenetcorpus.download.ht
ml

Chris Jordan suggested the SpamAssassin Corpus
(http://spamassassin.apache.org/).

Sabine Bartsch and Ramesh Krishnamurthy sent me a link to the Wolverhampton
junk email corpus(http://clg.wlv.ac.uk/projects/junk-email/); Sabine also
mentioned the email messages corpus from W3C lists
(http://tides.umiacs.umd.edu/webtrec/trecent/parsed_w3c_corpus.html).

I have now got plenty of corpus material to keep my 'Analysing Texts'
students busy... Thanks!

Very best wishes... Ute


************************************************************

Dr. Ute Römer
English Department
Leibniz University of Hanover
Königsworther Platz 1
30167 Hannover
Germany

Phone: +49 (0)511 762 2997
Fax: +49 (0)511 762 2996
Please note NEW e-mail address: ute.roemer at engsem.uni-hannover.de
http://www.uteroemer.com <http://www.uteroemer.com/>
http://www.engsem.uni-hannover.de/angli/



-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://mailman.uib.no/public/corpora-archive/attachments/20070411/e54d107a/attachment.html


More information about the Corpora-archive mailing list