[Corpora-List] word frequencies on the web

Dragomir R. Radev radev at umich.edu
Fri Dec 8 17:57:05 CET 2006

Have you seen this release from Google:



This data set, contributed by Google Inc., contains English word
n-grams and their observed frequency counts. The length of the n-grams
ranges from unigrams (single words) to five-grams. We expect this data
will be useful for statistical language modeling, e.g., for machine
translation or speech recognition, as well as for other uses.

Source Data

The n-gram counts were generated from approximately 1 trillion word
tokens of text from publicly accessible Web pages.


> Dear all, does anyone know of ways to estimate the frequency of words

> on the web, or if there're search engines that supply this info (as

> Altavista used to do)?


> thank you!

> tony

> www2.lael.pucsp.br/~tony





Dragomir R. Radev Associate Professor
SI, CSE, Ling U. Michigan, Ann Arbor
http://www.eecs.umich.edu/~radev radev at umich.edu

More information about the Corpora-archive mailing list