[Corpora-List] Google "region"-based searches

Diego Molla-Aliod diego.molla-aliod at mq.edu.au
Wed Nov 28 04:31:57 CET 2012

Google's (and Yahoo's) hit counts are estimates and you shouldn't rely on them too much. I once tried to incorporate them to recreate Magnini et al's work "Is It the Right Answer? Exploiting Web Redundancy for Answer Validation" but I gave up due to the inconsistencies in hit counts returned by Google and Yahoo. This was back in 2009 but I would be surprised if things were different now.


On 28 November 2012 10:18, Trevor Jenkins <trevor.jenkins at suneidesis.com>wrote:

> On 27 Nov 2012, at 23:00, John F Sowa <sowa at bestweb.net> wrote:
> > In ancient times (pre 21st century), Google supported Boolean
> > expressions for searching. But now it's impossible to control
> > their search in any predictable fashion.
> Google's implementation of Boolean expressions was never that good anyway.
> Their NOT (the - sign) never really worked as a Boolean NOT more of a
> "we'll disregard your request if we feel like it". Couple that with the
> lack of any (working) collocation features and it's a poor excuse for a
> text/document retrieval system.
> > But when I type just "enterprise integration pattern" by itself,
> > I get 114,000 hits. When I add another word, the number should
> > decrease. But the following combination gets 137,000 hits:
> There also used to be probably still is a hidden "feature" in that Google
> would terminate searches after some time slice. Even if there were more
> hits available you didn't see them. Used to be simple to demonstrate by
> submitting the same search request several times in quick succession never
> the same answer twice. The only numbers of results that can believe are
> zero and one anything is practically non-deterministic.
> > Does anybody know how to bypass the Google heuristics and
> > force it to use a simple regular expression for searching?
> Sadly no. Other than using a search engine with a better search system
> behind it. But unfortunately Google has, for the moment, the largest cache
> of web pages and documents.
> Personally I question whether Google is still a search engine, more a
> targeted adverts engine these days. (Thank god for browser add-ons like
> AdBlockPlus, Ghostery, GreaseMonkey and their like for squelching those
> nasty adverts.)
> [I should declare a commercial interest here I worked for paralog who
> produced one of the best … no *the* best* text retrieval system, trip.
> Product still exists although I've not been associated with it for over a
> decade. But it still remains the best there is; if you can afford to
> purchase it.]
> Regards, Trevor.
> <>< Re: deemed!
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora


This message is intended for the addressee named and may

contain confidential information. If you are not the intended

recipient, please delete it and notify the sender. Views expressed

in this message are those of the individual sender, and are not

necessarily the views of Macquarie University. --------------------------------------------------------------------- Dr. Diego MOLLA ALIOD diego.molla-aliod at mq.edu.au Department of Computing http://web.science.mq.edu.au/~diego Macquarie University -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 4230 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20121128/9ec7e1eb/attachment.txt>

More information about the Corpora mailing list