[Corpora-List] Query about nomenclature

Andrew Kehoe Andrew.Kehoe at uce.ac.uk
Fri Mar 11 21:30:03 CET 2005


John Sowa's original queries were

1) ngram
2) ngram not perl
3) n-gram

To get more accuate results, these should be run as

1) ngram
2) ngram -perl
3) "n-gram" (to force Google to match only 'n-gram' with a hyphen)

It is not necessary to run

"n-gram" -perl

because (as Damon Allen Davison said) the Perl module we want to filter out of the results is called Text::Ngram not Text::N-gram.

Andrew Kehoe
Research and Development Unit for English Studies
School of English
University of Central England, Birmingham
http://rdues.uce.ac.uk/ <http://rdues.uce.ac.uk/>

http://www.webcorp.org.uk/ <http://www.webcorp.org.uk/>


-----Original Message-----
From: owner-corpora at lists.uib.no on behalf of Normunds Gruzitis
Sent: Fri 11/03/2005 17:53
To: CORPORA at HD.UIB.NO
Cc:
Subject: RE: [Corpora-List] Query about nomenclature



Did you put "n-gram" in quotes in your search query?
Google's response to me: "Results 1 - 10 of about 63,600 for
"n-gram" -perl."

Regards,
Normunds


-----Original Message-----
From: owner-corpora at lists.uib.no [mailto:owner-corpora at lists.uib.no]On
Behalf Of Andrew Kehoe
Sent: Friday, March 11, 2005 5:33 PM
To: John F. Sowa
Cc: CORPORA at HD.UIB.NO
Subject: RE: [Corpora-List] Query about nomenclature


John

You need to use the search term "ngram -perl" rather than "ngram not
perl" because, as Stefan Evert pointed out, "ngram not perl" just
returns pages containing all 3 of those words.

Another problem with your method is that Google ignores hyphens in
search terms. One of the pages returned for the term "n-gram" is
http://cpan.dei.uc.pt/authors/id/J/JH/JHI/ngram.pl-1.48&e=8092 but this
page does not contain the word "n-gram" at all, only "ngram" without the
hyphen.

Andrew Kehoe
Research and Development Unit for English Studies
School of English
University of Central England, Birmingham
http://rdues.uce.ac.uk/

http://www.webcorp.org.uk/

-----Original Message-----
From: owner-corpora at lists.uib.no [mailto:owner-corpora at lists.uib.no] On
Behalf Of John F. Sowa
Sent: 10 March 2005 01:43
To: Damon Allen Davison
Cc: John Mckenny; CORPORA at HD.UIB.NO
Subject: Re: [Corpora-List] Query about nomenclature

Damon Davison's use of Google inspired me to try
a variation. I just typed three queries and
got the following number of hits:

Search string Hits
------------- ------
ngram 21,100

ngram not perl 540

n-gram 85,700

This seems to provide overwhelming evidence for
a hyphen between "n" and "gram". Since Google
doesn't distinguish capitals, that leaves the
capitalization question unresolved.

John Sowa














-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://mailman.uib.no/public/corpora-archive/attachments/20050311/324ff624/attachment.html


More information about the Corpora-archive mailing list