I'm guessing you're looking for tests that help you identify statistical significance of your query results? A good starting point may be: 2010f. Gries, Stefan Th. Useful statistics for corpus linguistics. In Aquilino Sánchez & Moisés Almela (eds.), A mosaic of corpus linguistics: selected approaches, 269-291. Frankfurt am Main: Peter Lang. (http://www.linguistics.ucsb.edu/faculty/stgries/research/overview-research.html)
On Mon, 03 Mar 2014 11:28:35 +0100 corpora-request at uib.no wrote:
> Message: 3
> Date: Fri, 28 Feb 2014 11:16:11 -0500
> From: Brian Schanding <bschanding at gmail.com>
> Subject: [Corpora-List] Considering Distributions Across Texts
> To: corpora at uib.no
> I'm working on research with learner corpora. My corpora aren't that big
> (approx. 250,000 wds with about 300-400 text files). I wonder what
> research/textbook sources anyone can point me to that discuss the
> importance of considering how many texts in the corpus a language feature
> occurs in (as opposed to merely considering overall frequency of a language
> feature within a corpus).
> Many Thanks!