[Corpora-List] Query: Corpora of American and British English that can be compared?

Eric Atwell E.S.Atwell at leeds.ac.uk
Thu Dec 20 12:38:43 CET 2012


I agree with Adam and True Friend: LOB v Brown are the long-standing established corpora to compare UK v US English, from 1960s. BUT you asked for " ... sufficiently close collection procedures for the hits they return to be compared ..." whcih suggests you really want web-as-corpus collections gathered more recently by web-crawlers? If so: World Wide English Corpus http://www.comp.leeds.ac.uk/eric/wwe.shtml includes 2M-word samples of UK English and US English, collected using SketchEngine's WebBootCat web-as-corpus harvester, for student exercises in comparing world varieties of English

Eric Atwell, Leeds University

On Thu, 20 Dec 2012, Adam Kilgarriff wrote:


> Dear Laure,
> the straightforward answer is the 'Brown family' corpora - Brown and LOB
> were compiled with just this kind of analysis in mind: they were both 1961
> and more comparable data points are available for 1991 (FROWN and FLOB) and
> (tho maybe this is British Englsih only) 1931, 1901 and 2006.
>
> You can do the comparisons easily and directly in the Sketch Engine, where
> the data is already set up (includiung POS-tagged) and the 'Brown family'
> corpus contains all the above except the 1901 part.
>
> Regards
>
> Adam
>
> On 18 December 2012 09:23, Laure Gardelle <laure.gardelle at ens-lyon.fr>
> wrote:
> Dear colleagues,
>
> For my research I need to compare one set of agreement patterns
> in American and British English.
> So would anyone know of two corpora (one for American English,
> the other for British English) that would have sufficiently
> close collection procedures for the hits they return to be
> compared (ie. for possible differences in proportion to be
> considered meaningful)?? Ideally I am looking for contemporary
> English, but if the data are a bit older, it is not a problem.
>
> Many thanks in advance for any help with this!
>
> Laure Gardelle
>
> _______________________________________________
> UNSUBSCRIBE from this page:
> http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>
>
>
>
> --
> ========================================
> Adam Kilgarriff                  adam at lexmasterclass.com                   
>                          
> Director                                    Lexical Computing Ltd          
>      
> Visiting Research Fellow                 University of Leeds      Corpora
> for all with the Sketch Engine                 
>                         DANTE: a lexical database for English              
>     ========================================
>
>

-- Eric Atwell, Associate Professor, Language research group,

I-AIBS Institute for Artificial Intelligence and Biological Systems

School of Computing, Faculty of Engineering, UNIVERSITY OF LEEDS

Leeds LS2 9JT, England. TEL: 0113-3435430 FAX: 0113-3435468

WWW: http://www.comp.leeds.ac.uk/eric

http://www.comp.leeds.ac.uk/nlp

http://www.comp.leeds.ac.uk/arabic



More information about the Corpora mailing list