[Corpora-List] Corpus heterogeneity

Adam Kilgarriff adam at lexmasterclass.com
Wed Nov 7 08:27:43 CET 2012


- Adam Kilgarriff Comparing Corpora<http://kilgarriff.co.uk/Publications/2001-K-CompCorpIJCL.pdf>2001

*International Journal of Corpus Linguistics* 6 (1): 1-37.

- Reprinted in *Corpus Linguistics: Critical Concepts in Linguistics.*Teubert and Krishnamurthy, editors. Routledge. 2007.


(with work on this from back in the 20th century. I think it stands up OK. We are currently reviewing, and implementing an improved version of the definition given there of 'corpus heterogeneity' for viewing in the Sketch Engine. In brief, the new definition builds on a definition of corpus similarity, and is, "the similarity between the two most different parts". We cluster documents to identify the two most different parts. )


On 6 November 2012 15:33, Stefan Th. Gries <stgries at gmail.com> wrote:

> Dear Alexander
> Please see: Gries, Stefan Th. 2006. Exploring variability within and
> between corpora: some methodological considerations. Corpora 1(2).
> 109-151.
> Cheers,
> --
> Stefan Th. Gries
> -----------------------------------------------
> University of California, Santa Barbara
> http://www.linguistics.ucsb.edu/faculty/stgries
> -----------------------------------------------
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora

-- ======================================== Adam Kilgarriff <http://www.kilgarriff.co.uk/> adam at lexmasterclass.com Director Lexical Computing Ltd<http://www.sketchengine.co.uk/>

Visiting Research Fellow University of Leeds<http://leeds.ac.uk>

*Corpora for all* with the Sketch Engine <http://www.sketchengine.co.uk>

*DANTE: a lexical database for English<http://www.webdante.com>

* ======================================== -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 3079 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20121107/0c8b396e/attachment.txt>

More information about the Corpora mailing list