We actually looked at this problem a long time ago and found that for some words, as you see more data, you get a monotonically increasingly better estimate of what it should be, assuming seeing all of the data as a yardstick. But for other words --and I don't mean obscure ones-- odd patterns happen.
James Curran and Miles Osborne. A very very large corpus doesn't always yield reliable estimates. Joint CoNLL02 - Workshop on Very Large Corpora, Taipei, Taiwan. 2002 http://www.cogsci.ed.ac.uk/~osborne/convergence.ps.gz
-- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.