[Corpora-List] Re: problems with Google

Andrew Kehoe Andrew.Kehoe at uce.ac.uk
Thu Mar 17 16:18:28 CET 2005


John

Even if you put double quotes around the wildcard character Google will ignore it. When you search for:

"what does "*" mean"

Google is actually searching for 2 'phrases': "what does " and " mean". You cannot nest double quotes in Google so the double quotes around the * are actually closing your initial quote and beginning a new quote, with the wildcard ignored completely.

It may be the case that SOME of the pages Google returns will contain "what does", followed by one other word, followed by "mean" but your query does not ask for this specifically. Google could (and does) also return pages containing "mean" and "what does" in the opposite order, or with multiple words in between.

Similarly, "what does "*" "*" mean" is actually searching for 3 'phrases': 1) "what does ", 2) " " (a space), and 3)" mean".

So, Google hasn't retained support for wildcards at all I'm afraid, and this is why we are developing our own search engine in WebCorp, as Antoinette Renouf mentioned yesterday.

Andrew Kehoe
Research and Development Unit for English Studies
Univerity of Central England in Birmingham

http://www.webcorp.org.uk/

-----Original Message-----
From: owner-corpora at lists.uib.no on behalf of John Milton
Sent: Thu 17/03/2005 13:39
To: CORPORA at uib.no
Cc:
Subject: [Corpora-List] Re: problems with Google



I just discovered that Google seems to have retained some use of the
wildcard for words if you use double quotes with the asterisk. A search
for "what does "*" mean" and "what does "*" "*" mean" results MAINLY in
any one and two words respectively. If anyone else is using web searches
as language learning/teaching resources, this also looks promising:
http://www.findforward.com/

John Milton
Hong Kong University of Science & Technology





-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://mailman.uib.no/public/corpora-archive/attachments/20050317/fcda4d32/attachment.html


More information about the Corpora-archive mailing list