[Corpora-List] Corpus from Blogs required.
trilokgk at gmail.com
Wed Mar 30 14:04:00 CEST 2005
Is corpus extracted from a variety of blogs available online (for
I would like to tag texts in such corpus and perform stylistic analysis on it.
Alternatively, is there an API for this blog post text extraction task ?
The XML-RPC API for Waypath (http://www.waypath.com/apis/) looks good,
but seems that it doesn't return full text of posts and documentation
avail. is not very detailed.
In the absence of such corpus and APIs, I am thinking of doing this by
1] using RSS, ATOM feed parsers on some OPML files to get URLs for blog posts
2] Extracting the text (easier if the blog template format is known)
Thanks and Regards,
More information about the Corpora-archive