[Corpora-List] Re: problems with Google

tianfang xut at cse.ohio-state.edu
Fri Mar 18 00:24:08 CET 2005


I didn't checked whether the beviour with Google's Web API and its
standard search has any difference, but I noticed that sometimes the API
claims that it will return N pages, but the truth is it doesn't. So if
you fetch 10 results starting from some start index, you may find out
there are not really 10 results in it.

Deane, Paul wrote:

> Has anybody checked whether the behavior with Google's Web API and its

> standard search is different?

>

> I have code using the Java Web API which makes use of the asterisk to

> blank out a single word (not an unrestricted wildcard.) As of yesterday,

> when I tested the code, it still appeared to be working as designed.

>

> -----Original Message-----

> *From:* Andrew Kehoe [mailto:Andrew.Kehoe at uce.ac.uk]

> *Sent:* Thursday, March 17, 2005 9:27 AM

> *To:* CORPORA at uib.no

> *Subject:* RE: [Corpora-List] Re: problems with Google

>

> John

>

> Even if you put double quotes around the wildcard character Google

> will ignore it. When you search for:

>

> "what does "*" mean"

>

> Google is actually searching for 2 'phrases': "what does " and "

> mean". You cannot nest double quotes in Google so the double quotes

> around the * are actually closing your initial quote and beginning a

> new quote, with the wildcard ignored completely.

>

> It may be the case that SOME of the pages Google returns will

> contain "what does", followed by one other word, followed by "mean"

> but your query does not ask for this specifically. Google could (and

> does) also return pages containing "mean" and "what does" in the

> opposite order, or with multiple words in between.

>

> Similarly, "what does "*" "*" mean" is actually searching for 3

> 'phrases': 1) "what does ", 2) " " (a space), and 3)" mean".

>

> So, Google hasn't retained support for wildcards at all I'm afraid,

> and this is why we are developing our own search engine in WebCorp,

> as Antoinette Renouf mentioned yesterday.

>

> Andrew Kehoe

> Research and Development Unit for English Studies

> Univerity of Central England in Birmingham

>

> http://www.webcorp.org.uk/

>

> -----Original Message-----

> *From:* owner-corpora at lists.uib.no on behalf of John Milton

> *Sent:* Thu 17/03/2005 13:39

> *To:* CORPORA at uib.no

> *Cc:*

> *Subject:* [Corpora-List] Re: problems with Google

>

> I just discovered that Google seems to have retained some use of the

> wildcard for words if you use double quotes with the asterisk. A

> search

> for "what does "*" mean" and "what does "*" "*" mean" results

> MAINLY in

> any one and two words respectively. If anyone else is using web

> searches

> as language learning/teaching resources, this also looks promising:

> http://www.findforward.com/

>

> John Milton

> Hong Kong University of Science & Technology

>

>

>

>

>

>

> **************************************************************************

>

> This e-mail and any files transmitted with it may contain privileged or

>

> confidential information. It is solely for use by the individual for whom

>

> it is intended, even if addressed incorrectly. If you received this e-mail

>

> in error, please notify the sender; do not disclose, copy, distribute, or

>

> take any action in reliance on the contents of this information; and delete

>

> it from your system. Any other use of this e-mail is prohibited. Thank you

>

> for your compliance.

>

>

>

>









More information about the Corpora-archive mailing list