See http://www.helsinki.fi/varieng/series/volumes/10/lijffijt_saily_nevalainen/ and the references therein - e.g. 5th para of the introduction: "Several studies (Kilgarriff 2001, Kilgarriff 2005, Evert 2006, Paquot & Bestgen 2009, Lijffijt et al. 2011, Lijffijt et al. forthcoming) have argued that statistical tests based on relative word counts per text are more appropriate for comparing corpora than statistical tests based on relative word counts per corpus, because the latter group ignores the structure of a corpus and its texts."
-- Tanja Säily MA, Postgraduate Student Research Unit for Variation, Contacts and Change in English (VARIENG) http://www.helsinki.fi/varieng/people/varieng_saily.html
On 2014-02-28, at 18:16, Brian Schanding <bschanding at gmail.com> wrote:
> I'm working on research with learner corpora. My corpora aren't that big (approx. 250,000 wds with about 300-400 text files). I wonder what research/textbook sources anyone can point me to that discuss the importance of considering how many texts in the corpus a language feature occurs in (as opposed to merely considering overall frequency of a language feature within a corpus).
> Many Thanks!
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no