[Corpora-List] Google "region"-based searches

Tristan Miller miller at ukp.informatik.tu-darmstadt.de
Wed Nov 28 10:49:12 CET 2012


Greetings.

On 28/11/12 12:00 AM, John F Sowa wrote:
> In ancient times (pre 21st century), Google supported Boolean
> expressions for searching. But now it's impossible to control
> their search in any predictable fashion.
>
> For example, I wanted to count the number of web pages that used
> the phrase "enterprise integration pattern" and the word 'sql'.
>
> But when I type just "enterprise integration pattern" by itself,
> I get 114,000 hits. When I add another word, the number should
> decrease. But the following combination gets 137,000 hits:
>
> "enterprise integration pattern" sql
>
> The following combination gets 274,000 hits:
>
> "enterprise integration pattern" java
>
> And the following gets 25,900,000 hits:
>
> "enterprise integration pattern" java sql
>
> I get the same numbers with a one-line search or with
> their so-called advanced search.
>
> Does anybody know how to bypass the Google heuristics and
> force it to use a simple regular expression for searching?

Google used to support a "+" modifier for search terms; this instructed the search to return only those pages which include the search terms. (Without the modifier, Google was free to disregard the search terms at its discretion.) The "+" modifier was dropped, probably for marketing reasons, once Google+ was introduced. Supposedly you can now achieve the same effect by putting the "required" terms in quotation marks, and in my experience this works most of the time. For your examples, it appears that sometimes it does and sometimes it doesn't:

"enterprise integration pattern"

gets 117,000 hits, but oddly both

"enterprise integration pattern" sql

and

"enterprise integration pattern" "sql"

get 137,000 results. On the other hand,

"enterprise integration pattern" java sql

gets 25,800,000 results, but

"enterprise integration pattern" "java" "sql"

returns a more sensible 8520 results.

Regards, Tristan

-- Tristan Miller, Doctoral Researcher Ubiquitous Knowledge Processing Lab (UKP-TUDA) Department of Computer Science, Technische Universitšt Darmstadt Tel: +49 6151 16 6166 | Web: http://www.ukp.tu-darmstadt.de/

-------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 261 bytes Desc: OpenPGP digital signature URL: <https://mailman.uib.no/public/corpora/attachments/20121128/60076705/attachment.asc>



More information about the Corpora mailing list