> At any rate, if you want to have access to a large, current corpus -- with complete and total and thoroughly satisfying full-text access -- then why not just create your own corpus, and then keep it updated?
Heh. we actually tried that, got a GSoC summer student who we'd hoped would work with both the Wacky group, and with the wikimedia/nutch/lucene to build a distributed web crawler that would keep a corpus up-to-date (and as input fodder to improve search, thus the connection to the search engine folks) Unfortunately, he wasn't closely supervised and wandered off in a less useful direction.
--linas