[Corpora-List] Re: problems with Google

Tom Emerson tree at basistech.com
Sat Mar 19 21:22:00 CET 2005


Pascal Soucy writes:

> Googles does that with all stopwords. If you search for:

> what does "the" "the" mean, you'll get the same behavior. Google ignores

> stopwords (and * seems to managed as a stopword).


Not really. Two identical stopwords in succession are kept. Try a
search for "The The" (a band from the late '80s) and you will get hits
on the determiner usage in isolation. You also get different hits for
a search of simply "the".

-tree


> Both the queries:

>

> what does "*" mean

>

> and

>

> what does "*" "*" mean

>

> results in about the same list of documents. The difference between the two

> occurs in the ranking process. The ranking algorithm likely use term proximity

> so to better match the query as it is written and it keep the position of

> stopwords in the query to do that.

>

> Pascal Soucy

> Coveo

>

> Selon John Milton <lcjohn at ust.hk>, 17.03.2005:

>

> > I just discovered that Google seems to have retained some use of the

> > wildcard for words if you use double quotes with the asterisk. A search

> > for "what does "*" mean" and "what does "*" "*" mean" results MAINLY in

> > any one and two words respectively. If anyone else is using web searches

> > as language learning/teaching resources, this also looks promising:

> > http://www.findforward.com/

> >

> > John Milton

> > Hong Kong University of Science & Technology

> >

> >

> >

> >

>

>

>

>


--
Tom Emerson Basis Technology Corp.
Software Architect http://www.basistech.com
"Beware the lollipop of mediocrity: lick it once and you suck forever"





More information about the Corpora-archive mailing list