[Corpora-List] mailing list corpora

Niels Ott niels at drni.de
Fri Jun 16 00:13:00 CEST 2006


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Adam ENDRODI wrote:

> Thoughts, hints? Have you run into similar problems or indeed I am

> the only one to miss the obvious?


Here's my idea:

- - Work on Usenet data.
- - Do not use archives. If you take postings from a larger
number of high traffic groups, you should easily get
your 10.000 postings.
- - Use Mozilla Thunderbird.
- - Create a Newsgroup account and subscribe to a number of
groups.
- - For each group:
- Download a lot of headers (you will be asked
when you click on the group's name for the first
time).
- Go to menu "Edit" -> "Newsgroup Properties",
click on tab "Offline", click button "Download
now".
- Wait. (This can take a while...)
- - Result: In ~/.thunderbird/<someID>/News/<newsaccountname>
you find an mbox file for each newsgroup

Best,

Niels

(Still CL Student at Tübingen Univ.)

- --
Me & Myself: http://www.drni.de/niels/
"Freedom's just another word for nothing left to lose..." (Janis Joplin)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)

iD8DBQFEkdrlbosnVosUgx0RAvZKAJ9x4EvQNFo+laCSaBklQdVb9M1iLACfSPDT
ZXfiSYbJQcbyFthQ+AxYAvQ=
=3cO5
-----END PGP SIGNATURE-----





More information about the Corpora-archive mailing list