[Corpora-List] Frequency lists (corrected)
stefan.evert at uos.de
Mon Feb 23 21:29:11 CET 2009
> There is, of course, the Google language modeling data, based on over
> a trillion words worth of web pages:
In that context, I can't resist pointing out my signature ...
The wonders of Googleology (episode 1)
"from collectibles to cars"
84,700,000 -- Google
9,443,672 -- Google N-grams (Web 1T5)
1 -- ukWaC
[ stefan.evert at uos.de | http://purl.org/stefan.evert ]
More information about the Corpora