Similarly with dictionaries: they concentrate on the canon (and enough other authors to be able to illuminate the meaning of the canon), and why shouldn't they? Most users of dictionaries use them to read the canon. Hence the peaks. And these effects get compounded when you look at literature from the past, because the canon survives preferentially: there are more copies of the canon, and people value the canon more, so more of it survives. I do not think that the OED, even though it is admirably non-normative and comprehensive and so on, is immune to these effects (it is very hard to avoid them when dealing with historical texts). But it's not a corpus, it's a dictionary.
On Tue, Apr 26, 2011 at 09:51:26AM -0400, John F. Sowa wrote:
> On 4/25/2011 5:12 PM, chris brew wrote:
> >I think part of the 1600 bump must correspond to William Shakespeare
> >(1564-1616, first folio published 1623, second folio published 1632)
> >and that a corresponding bump from 1380-1400 corresponds to Chaucer (you
> >have to set the granularity to 10 years to see it clearly)
> >Something else happened in the 1650-1659 decade. I have a plausible
> >hypothesis but no more...
> Those are interesting hypotheses about the effects of literature
> and the methods of recording, distribution, and preservation.
> Some of those effects are probably distorted by historical accidents
> of loss and preservation. But the decisions of editors about which
> sources to consider would also influence the results.
> Ted Pedersen:
> >... there are local peaks around the years 1400, 1600, and 1900,
> >with valleys around 1500, 1750, and the present day.
> I can't believe that the present day with the huge expansion
> of the WWW is a true valley. And the valley around 1750 was
> a period of active colonization that may have produced many
> words that weren't recorded in the OED sources.
> It would be interesting to to do a more detailed study of word
> creation and disuse by going back to the original documents,
> when more of them become digitized.
> John Sowa
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no