[Corpora-List] Query on the use of Google for corpus research

Chris Jordan cjordan at cs.dal.ca
Fri May 27 15:09:01 CEST 2005


Oops,

typ-o in the URL of my last. Sorry about that.

http://gatekeeper.dec.com/pub/DEC/SRC/technical-notes/abstracts/src-tn-1998-014.html

Chris Jordan wrote:


> Hello,

>

> I would recommend looking at the following reference as it is highly

> related:

> Craig Silverstein, Monika Henzinger, Hannes Marais, and Michael

> Moriez. Analysis of a very large Altavista Query Log. Technical Report

> 1998-014, Digital SRC, 1998.

> http://gatekeeper.dec.com/pub/DEC/SRC/technicalnotes/abstracts/src-tn-1998-014.html

>

>

> There are some interesting issues with regard to examining such data.

> The first that really comes to mind is that you have to be able to

> distinguish between search sessions. This is non-trivial as users

> typically do not have a single goal when searching; there is some work

> by Spink on this topic. Both gathering this query data at the client

> side and at the server side have their own set of problems.

>

> As statistics are being gathered, it is important to discuss

> properties of the user group (sample population) being evaluated.

> Depending on the diversity of the sample (or lack of it) will

> determine what kind of conclusions can be made.

>

> Hope that helps,

>

> Chris

>

> Peter K Tan wrote:

>

>> Just forwarding a question from a colleague. Would be grateful for

>> comments.

>>

>> Cheers,

>> Peter

>>

>> From: Michelle Maria Lazar

>> Sent: 27 May 2005 11.27

>> To: Peter K W Tan; Talib, I S; Vincent Ooi; Wee Hock Ann, Lionel

>> Subject: Query on the use of Google for corpus research

>>

>> Hi all,

>> Someone has written to ask me whether there's any foreseeable

>> problem/objection in using Google to gather statistical evidence

>> on particular language usage, using key word searches. It involves

>> a submission of an article currently under review. Does anyone

>> have any experience/insight on this?

>>

>> Cheers,

>>

>> Michelle

>>

>






More information about the Corpora-archive mailing list