[Corpora-List] Query on the use of Google for corpus research

Chris Jordan cjordan at cs.dal.ca
Fri May 27 15:09:01 CEST 2005


typ-o in the URL of my last. Sorry about that.


Chris Jordan wrote:

> Hello,


> I would recommend looking at the following reference as it is highly

> related:

> Craig Silverstein, Monika Henzinger, Hannes Marais, and Michael

> Moriez. Analysis of a very large Altavista Query Log. Technical Report

> 1998-014, Digital SRC, 1998.

> http://gatekeeper.dec.com/pub/DEC/SRC/technicalnotes/abstracts/src-tn-1998-014.html



> There are some interesting issues with regard to examining such data.

> The first that really comes to mind is that you have to be able to

> distinguish between search sessions. This is non-trivial as users

> typically do not have a single goal when searching; there is some work

> by Spink on this topic. Both gathering this query data at the client

> side and at the server side have their own set of problems.


> As statistics are being gathered, it is important to discuss

> properties of the user group (sample population) being evaluated.

> Depending on the diversity of the sample (or lack of it) will

> determine what kind of conclusions can be made.


> Hope that helps,


> Chris


> Peter K Tan wrote:


>> Just forwarding a question from a colleague. Would be grateful for

>> comments.


>> Cheers,

>> Peter


>> From: Michelle Maria Lazar

>> Sent: 27 May 2005 11.27

>> To: Peter K W Tan; Talib, I S; Vincent Ooi; Wee Hock Ann, Lionel

>> Subject: Query on the use of Google for corpus research


>> Hi all,

>> Someone has written to ask me whether there's any foreseeable

>> problem/objection in using Google to gather statistical evidence

>> on particular language usage, using key word searches. It involves

>> a submission of an article currently under review. Does anyone

>> have any experience/insight on this?


>> Cheers,


>> Michelle



More information about the Corpora-archive mailing list