[Corpora-List] Re: problems with Google counts

Thu Mar 17 03:35:06 CET 2005

Hi, Corpora Guys,

Sorry I don't remember who wrote suggesting simply repeating the word in
Google to get a supposedly more realistic count of pages with the word in it
(I had deleted all those messages after reading them). I tried this
yesterday on a couple of Spanish words (eficaz, eficiente). (By the way,
the results were apparently consonant with a student's search of the
100,000,000 word corpusdelespañol site.) Anyway, what repeating the word
apparently does is limit the results to those sites which have the word at
least two times, in this case cutting down on the numbers by roughly 10%.
If that is what is happening, this implies serious problems for relatively
rare words, which may not occur twice in very many pages at all. At any
rate, the decrease in pages encountered seemed to be about the same
proportionately in both cases. (We're talking here about roughly 1.5M
original hits.) If I'm missing the point of the suggestion, please
straighten me out.


James L. Fidelholtz
Posgrado en Ciencias del Lenguaje, ICSyH
Benemérita Universidad Autónoma de Puebla MÉXICO

More information about the Corpora-archive mailing list