[Corpora-List] Google "region"-based searches

Trevor Jenkins trevor.jenkins at suneidesis.com
Wed Nov 28 00:18:03 CET 2012

On 27 Nov 2012, at 23:00, John F Sowa <sowa at bestweb.net> wrote:

> In ancient times (pre 21st century), Google supported Boolean
> expressions for searching. But now it's impossible to control
> their search in any predictable fashion.

Google's implementation of Boolean expressions was never that good anyway. Their NOT (the - sign) never really worked as a Boolean NOT more of a "we'll disregard your request if we feel like it". Couple that with the lack of any (working) collocation features and it's a poor excuse for a text/document retrieval system.

> But when I type just "enterprise integration pattern" by itself,
> I get 114,000 hits. When I add another word, the number should
> decrease. But the following combination gets 137,000 hits:

There also used to be probably still is a hidden "feature" in that Google would terminate searches after some time slice. Even if there were more hits available you didn't see them. Used to be simple to demonstrate by submitting the same search request several times in quick succession never the same answer twice. The only numbers of results that can believe are zero and one anything is practically non-deterministic.

> Does anybody know how to bypass the Google heuristics and
> force it to use a simple regular expression for searching?

Sadly no. Other than using a search engine with a better search system behind it. But unfortunately Google has, for the moment, the largest cache of web pages and documents.

Personally I question whether Google is still a search engine, more a targeted adverts engine these days. (Thank god for browser add-ons like AdBlockPlus, Ghostery, GreaseMonkey and their like for squelching those nasty adverts.)

[I should declare a commercial interest here I worked for paralog who produced one of the best … no *the* best* text retrieval system, trip. Product still exists although I've not been associated with it for over a decade. But it still remains the best there is; if you can afford to purchase it.]

Regards, Trevor.

<>< Re: deemed!

More information about the Corpora mailing list