[Corpora-List] Diachronic frequency change

Angus Grieve-Smith grvsmth at panix.com
Sun May 13 23:47:35 CEST 2012


On 5/11/2012 9:27 AM, Brett Reynolds wrote:
> Since this is a percentage, it shows an increase relative to other words. if you wanted to test for significance, would it make sense to simply use this comparison (string vs entire corpus) or would it make more sense to compare it to another similar string such as "many of the"? What statistical test would you use? Would it be best to compare the nadir and the peak, or to repeatedly compare consecutive years?
>
> I expect that the answers will be something like "that depends on your purpose." Currently, however, I don't really have a purpose. I'm just poking around, observing, and learning.

Not quite! The answer is "You can't test for significance if you don't have a representative sample."

Yes, when lexical items increase in frequency (per word) it's always at the expense of something else. That means that someone is making a choice to start using that word for a particular function instead of a competing construction. I'm guessing something more like "all the," but it would take some more detailed study.

http://books.google.com/ngrams/graph?content=all+of+the%2C+all+the&year_start=1800&year_end=1950&corpus=0&smoothing=3

There are other frequency effects, described by Joan Bybee and others: frequently used strings change their meanings in relatively predictable ways, and undergo phonological reduction. You can't really investigate that with Google Ngrams, but it can give you an idea of where to start.

http://www.unm.edu/~jbybee/page4.html

--

-Angus B. Grieve-Smith

Saint John's University

grvsmth at panix.com



More information about the Corpora mailing list