[Corpora-List] Looking for genre-specific corpora

Marina Santini marinamailinglists at gmail.com
Tue Apr 26 14:06:11 CEST 2011


Dear CorporaList members,

I am doing some research in concept extraction from differet types of texts or genres.

I am looking for free research corpora belonging to the following genres:

1) FAQs (I have already downloaded some small collections, but I would like to have a more comprehensive range of topics). 2) Chat logs transcripts (I have already downloaded the NPS Collection, 3 Codiac datasets and several smallish Many Eyes datasets) 3) Telephone conversation transcripts (missing) 4) emails (I have already downloaded the Enron dataset and a couple of junk mail collections) 5) Twitter's posts corpora (missing, apparently the Edinburgh's Twitter corpus is not available any more) 6) corporate weblog corpora (missing)

I will be glad to share all the links and related documentation, once I got all the genres in the list.

Thanks in advance for your suggestions.

Best Regards

-- Marina Santini Researcher at Artificial Solutions



More information about the Corpora mailing list