[Corpora-List] blog dataset availability

Paula Chesley ches0045
Sat Apr 4 03:34:29 CEST 2009


Hi corpora list members,

I'm looking for a pretty big blog dataset that is marked up for the following attributes: writer ID blog ID reader IDs (who will be writers of other blogs/entries) time of publication whether/how often blog ID is referenced by other blogs (as in network information) The ICWSM 2009 dataset is *almost* what I'm looking for, but not quite: it doesn't have specific trackback information, like what specific blogs, in terms of URLs, link to a given blog or a given post on the blog. This info. is necessary for me to see how a linguistic variable spreads in the blogosphere.

If you know about such a dataset, I'd appreciate any information you might have!

Thanks, Paula -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 1980 bytes Desc: not available Url : https://mailman.uib.no/public/corpora/attachments/20090403/1800af62/attachment.txt



More information about the Corpora mailing list