[Corpora-List] word frequencies on the web

Dragomir R. Radev radev at umich.edu
Fri Dec 8 17:57:05 CET 2006


Have you seen this release from Google:

http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2006T13


Introduction

This data set, contributed by Google Inc., contains English word
n-grams and their observed frequency counts. The length of the n-grams
ranges from unigrams (single words) to five-grams. We expect this data
will be useful for statistical language modeling, e.g., for machine
translation or speech recognition, as well as for other uses.

Source Data

The n-gram counts were generated from approximately 1 trillion word
tokens of text from publicly accessible Web pages.




>

> Dear all, does anyone know of ways to estimate the frequency of words

> on the web, or if there're search engines that supply this info (as

> Altavista used to do)?

>

> thank you!

> tony

> www2.lael.pucsp.br/~tony

>

>

>

>



--
Dragomir R. Radev Associate Professor
SI, CSE, Ling U. Michigan, Ann Arbor
http://www.eecs.umich.edu/~radev radev at umich.edu





More information about the Corpora-archive mailing list