[Corpora-List] Google "region"-based searches

Trevor Jenkins trevor.jenkins at suneidesis.com
Wed Nov 28 11:46:35 CET 2012


On 28 Nov 2012, at 03:31, Diego Molla-Aliod <diego.molla-aliod at mq.edu.au> wrote:


> Google's (and Yahoo's) hit counts are estimates and you shouldn't rely on them too much.

Personally I never trust those numbers. When using Google we should be like the Pirahã tribe and use the innate numbering scheme mankind has that there are only three numbers one, two and many.


> I once tried to incorporate them to recreate Magnini et al's work "Is It the Right Answer? Exploiting Web Redundancy for Answer Validation" but I gave up due to the inconsistencies in hit counts returned by Google and Yahoo. This was back in 2009 but I would be surprised if things were different now.

The numbers will change over time. More pages are spidered to increasing the number of hits (daily?) and pages removed as legal proceedings are successful because pages contain defamation, libel, slander, or because content breaks some court injunction (super or normal). With some bizarre removals taking place for example the recent one where the wife of a leading UK politician had to remove the name of a girl from her twitter feed because publishing it broke some child/sexual victim protection law despite that a trivial search will turn up that same name in press reports of the case when the girl disappeared to France with her Maths teacher.

Regards, Trevor.

<>< Re: deemed!



More information about the Corpora mailing list