However, just for the record, I don't think the BNC ever claimed to be representative of language usage as a whole. Its design principles, as I understood them at least, were chiefly to give "equal time" to as much of possible of the discernible varieties of late 20th c. English, impressionistically defined, and within the bounds of what was economically feasible at the time. Which is clearly not the same thing at all.
On 03/02/15 15:52, Krishnamurthy, Ramesh wrote:
> Hi Angus
> As we have no adequate way of estimating language usage,
> and corpora are samples of language usage,
> is there any point in discussing 'representativeness' again?
> Or has there been an advance in estimating language usage
> in the past 30 years that I am unaware of?
> Date: Mon, 02 Feb 2015 22:40:04 -0500
> From: Angus Grieve-Smith <grvsmth at panix.com>
> Subject: Re: [Corpora-List] An ignorant question concerning the basics
> of statistical significance
> To: corpora at uib.no
> I know that David Lee had problems with the representativeness of
> the BNC, but I believe that Tony McEnery, at least, is on the list, so
> he can maybe tell us more about why the BNC is representative, and of what.
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no