[Corpora-List] Diachronic frequency change

Brett Reynolds brettrey at gmail.com
Fri May 11 15:27:54 CEST 2012

The string "all of the", for example, demonstrates a dramatic increase in frequency as a percentage of the entire corpus leading up to about 1920 as can be seen in this Google Ngram graph:


Since this is a percentage, it shows an increase relative to other words. if you wanted to test for significance, would it make sense to simply use this comparison (string vs entire corpus) or would it make more sense to compare it to another similar string such as "many of the"? What statistical test would you use? Would it be best to compare the nadir and the peak, or to repeatedly compare consecutive years?

I expect that the answers will be something like "that depends on your purpose." Currently, however, I don't really have a purpose. I'm just poking around, observing, and learning.

Best, Brett

----------------------- Brett Reynolds English Language Centre Humber College Institute of Technology and Advanced Learning Toronto, Ontario, Canada brett.reynolds at humber.ca

More information about the Corpora mailing list