[Corpora-List] (Google Books) n-grams

Mark Davies Mark_Davies at byu.edu
Thu Jul 18 18:22:30 CEST 2013



>> Access to Google n-grams seems to have sparked interest in studies into historical changes in social, cultural, and political values?

Problem is, the standard Google Books "n-grams" site (http://books.google.com/ngrams/) doesn't really do much with the n-grams themselves, except to search for *specific, exact phrases* inputted by the user. For example, it can't find the most common adjectives near "food", or the most common nouns near "fast".

At http://googlebooks.byu.edu/, though, much more of the potential of the Google Books n-grams data is available -- for research on historical and cultural shifts. For a number of examples, see: http://googlebooks.byu.edu/compare-googleBooks.asp.

And for those who want access to the n-grams from the 400 million word Corpus of Historical American English (http://corpus.byu.edu/coha/), there are freely-available n-grams as well: http://www.ngrams.info/download_coha.asp. (This is in addition to the COCA n-grams: http://www.ngrams.info/).

Best,

Mark D.

============================================ Mark Davies Professor of Linguistics / Brigham Young University http://davies-linguistics.byu.edu/ ** Corpus design and use // Linguistic databases ** ** Historical linguistics // Language variation ** ** English, Spanish, and Portuguese ** ============================================


> -----Original Message-----
> From: corpora-bounces at uib.no [mailto:corpora-bounces at uib.no] On Behalf
> Of Krishnamurthy, Ramesh
> Sent: Thursday, July 18, 2013 10:02 AM
> To: cedric.krummes at uni-leipzig.de
> Cc: corpora at uib.no
> Subject: [Corpora-List] (no subject)
>
> Hi Cedric
>
>
>
> As we cannot be sure of the meaning or the part-of-speech of an item
>
> from a word frequency list, are not n-grams a sort of halfway house
>
> between word frequency lists and concordances?
>
>
>
> To me, n-grams is just one of the tools in the corpus linguistics toolbag,
>
> although it may be a relative newcomer, and hasn't grabbed the headlines
>
> like keywords, perhaps.
>
>
>
> If I remember correctly, at Cobuild, we first used bigrams for the BBC
>
> dictionary (published in 1992). I don't think n-grams was a feature of
>
> the earlier versions of WordSmith, and even in the more recent
>
> AntConc, the n-grams option is slightly hidden.
>
>
>
> Since the 1990s, I have used n-grams as a routine part of corpus
>
> analysis, if they are available in the software I am using at the time,
>
> for a variety of purposes (eg investigating language varieties in 'The
>
> Globalization of Business English?' at Complex 2001; investigating
>
> genre features in 'A corpus-based analysis of junk emails' at LREC
>
> 2002; and recently, to compare Business Spanish and Business French
>
> in research for the COMENEGO project).
>
>
>
> Access to Google n-grams seems to have sparked interest in studies
>
> into historical changes in social, cultural, and political values?
>
>
>
>
>
> best
>
> Ramesh
>
> -----------------------------------------------------------------------
>
> Date: Thu, 18 Jul 2013 09:51:30 +0200
> From: Cedric Krummes <cedric.krummes at uni-leipzig.de>
> Subject: [Corpora-List] Uses of N-grams?
> To: Corpora at uib.no
>
> Hello,
>
> Regarding n-grams (highly frequent word sequences like "on the other hand"
> or "why don't you"), does anybody any uses for them apart from language
> teaching.
>
> Most literature dealing with n-grams seems to apply them to foreign
> language teaching, second language acquisition, or English for X purposes.
> Any other uses?
>
> Best wishes,
>
> Cédric Krummes
> --
> Dr. Cédric Krummes
>
> Universität Leipzig ˇ +49-341-97-37404
> http://www.cedrickrummes.org/contact.php
>
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list