1. How to discover language specific feed urls (rss/atom)? Is crawling the web the only solution? Is it possible to piggyback on search engines (preferably Bing)?
2. Are there any large open-source repositories for language specific feed urls?
3. What are the best practices in implementing a feed aggregator? Any known statistics on the percentage of pingback blogs in the blogosphere?
4. Are there any existing feed aggregators which can handle millions of feeds (intelligently)?
5. Existing affordable licensed tools/open-source tools implementing any of the above steps?
We would also like to know more about similar projects by other groups (and possibly collaborate).
thanks very much,
================================================= Siva Reddy http://www.sivareddy.in Lexical Computing Ltd. http://www.sketchengine.co.uk University of York http://www.cs.york.ac.uk ================================================= -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 1845 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20120112/54fb9618/attachment.txt>