[Corpora-List] mailing list corpora

Paula Newman paulan at earthlink.net
Fri Jun 16 04:25:01 CEST 2006

There's a standard "20 newsgroup" distribution. One source (first hit from
google) is


> [Original Message]

> From: Adam ENDRODI <borso at vekoll.saturnus.vein.hu>


> Date: 6/15/2006 6:10:59 PM

> Subject: [Corpora-List] mailing list corpora



> Hello there,



> For a light survey on communication style on the Internet I need some

> 10000 emails submitted to mailing lists or newsgroups. Language and

> topic wouldn't matter as long as they are written in latin letters

> more-or-less (I mean English, German, French, Spanish, Polish etc).

> I'm interested mainly in non-IT-related lists.


> In the beginning I though it must be the most trivial task to find

> archives suitable to my needs, but visiting a couple of public

> archiver sites I realized they wouldn't provide robot-friendly

> access to thousands of mails (mboxes), presumably because of the

> fear of address-harversers. Archive.info told me French law forbids

> the publication of raw emails. The op at mail-archive.com suggested

> that I should google around. That said I looked around, just to find

> out mostly IT-related groups (software developers and such) make their

> archives available in mbox, which does not cover all of my needs.


> Thoughts, hints? Have you run into similar problems or indeed I am

> the only one to miss the obvious?


> Thanks in advance:

> adam


> PS: Please CC me your reply if you don't mind -- I'm not a list member.

> Excuse me the inconvenience.


> [ You can read the messages at: https://mailman.uib.no/public/corpora/

> Listadm Corpora ]


More information about the Corpora-archive mailing list