> I need a tool for extracting all the text from pages and subpages of a Web
> Forum. I do not need a cleaning tool at the moment.
> Can you suggest a tool to perform this operation?
We developed SiteScraper (http://sitescraper.googlecode.com) at Melbourne University for exactly this purpose -- scraping threads from web user forums, maintaining as much structure as possible (e.g. posts, titles, thread titles, timestamps, post authors). You will need to provide a couple of training instances (literally a handful), but otherwise, it should just work. Email me off list if you are after more details.