[Corpora-List] Wonky ngrams

Mark Davies Mark_Davies
Fri Jan 4 15:04:58 CET 2013


In my interface to the Google Book n-grams (http://googlebooks.byu.edu/), the actual frequency data is displayed, and the numbers do make sense.

For example (from the 155 billion word American English n-grams):

in spite (1990s): 289,536 tokens http://googlebooks.byu.edu/?c=us&q=20263989

in spite of (1990s): 287,612 tokens http://googlebooks.byu.edu/?c=us&q=20263992

Mark D.

============================================ Mark Davies Professor of Linguistics / Brigham Young University http://davies-linguistics.byu.edu/ ** Corpus design and use // Linguistic databases ** ** Historical linguistics // Language variation ** ** English, Spanish, and Portuguese ** ============================================

From: corpora-bounces at uib.no [corpora-bounces at uib.no] on behalf of Brett Reynolds [Brett.Reynolds at humber.ca] Sent: Friday, January 04, 2013 5:04 AM To: corpora at uib.no Subject: [Corpora-List] Wonky ngrams

Can anyone explain why "in spite of" would have a higher frequency than "in spite" in the following graph from Google ngrams? http://goo.gl/u7J3F

-------------------------------------

Brett Reynolds English Language Centre Humber Institute of Technology and Advanced Learning Lakeshore Campus Toronto, Ontario Phone: 416-675-6622 ex. 3106

brett.reynolds at humber.ca



More information about the Corpora mailing list