[Corpora-List] Corpus from Blogs required.

Trilok Khairnar trilokgk at gmail.com
Wed Mar 30 14:04:00 CEST 2005


Is corpus extracted from a variety of blogs available online (for
academic use)?
I would like to tag texts in such corpus and perform stylistic analysis on it.

Alternatively, is there an API for this blog post text extraction task ?
The XML-RPC API for Waypath (http://www.waypath.com/apis/) looks good,
but seems that it doesn't return full text of posts and documentation
avail. is not very detailed.

In the absence of such corpus and APIs, I am thinking of doing this by
1] using RSS, ATOM feed parsers on some OPML files to get URLs for blog posts
2] Extracting the text (easier if the blog template format is known)

Thanks and Regards,

More information about the Corpora-archive mailing list