[Corpora-List] the ebb and flow of inclusion of words in OED?

Graham White graham at eecs.qmul.ac.uk
Tue Apr 26 16:26:56 CEST 2011

I think that the effects of sources and sampling are quite large here, and it's something that happens in a lot of fields (for example, the history of philosophy). What tends to happen is that a small number of authors get established as canonical, and then others are investigated because of their connection with the canonical ones. Thus, for example, Brentano and Schleiermacher are both missing from all but the specialised histories of philosophy, even though they are both interesting and also influential: the standard histories, though, tell the story of nineteenth century German philosophy in terms of a narrative involving Kant and the idealists who followed him, and people who don't fit into that narrative are simply ignored, by and large.

Similarly with dictionaries: they concentrate on the canon (and enough other authors to be able to illuminate the meaning of the canon), and why shouldn't they? Most users of dictionaries use them to read the canon. Hence the peaks. And these effects get compounded when you look at literature from the past, because the canon survives preferentially: there are more copies of the canon, and people value the canon more, so more of it survives. I do not think that the OED, even though it is admirably non-normative and comprehensive and so on, is immune to these effects (it is very hard to avoid them when dealing with historical texts). But it's not a corpus, it's a dictionary.


On Tue, Apr 26, 2011 at 09:51:26AM -0400, John F. Sowa wrote:
> On 4/25/2011 5:12 PM, chris brew wrote:
> >I think part of the 1600 bump must correspond to William Shakespeare
> >(1564-1616, first folio published 1623, second folio published 1632)
> >and that a corresponding bump from 1380-1400 corresponds to Chaucer (you
> >have to set the granularity to 10 years to see it clearly)
> >
> >Something else happened in the 1650-1659 decade. I have a plausible
> >hypothesis but no more...
> Those are interesting hypotheses about the effects of literature
> and the methods of recording, distribution, and preservation.
> Some of those effects are probably distorted by historical accidents
> of loss and preservation. But the decisions of editors about which
> sources to consider would also influence the results.
> Ted Pedersen:
> >... there are local peaks around the years 1400, 1600, and 1900,
> >with valleys around 1500, 1750, and the present day.
> I can't believe that the present day with the huge expansion
> of the WWW is a true valley. And the valley around 1750 was
> a period of active colonization that may have produced many
> words that weren't recorded in the OED sources.
> It would be interesting to to do a more detailed study of word
> creation and disuse by going back to the original documents,
> when more of them become digitized.
> John Sowa
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora

More information about the Corpora mailing list