The free list link gives error. You may want to look at it.
Toddy Twitter: @toddysm Blog: http://www.toddysm.com Sent from my Blackberry
-----Original Message----- From: Mark Davies <Mark_Davies at byu.edu> Sender: corpora-bounces at uib.no Date: Mon, 21 Nov 2011 20:13:10 To: corpora at uib.no<corpora at uib.no> Subject: [Corpora-List] COCA (and COHA) n-grams data
We are pleased to announce that n-grams data from the COCA and COHA corpora is now available for download (http://www.ngrams.info) -- much of it for free.
The free n-grams from COCA (http://corpus.byu.edu/coca; 425 million words, 1990-2011) contain the one million most frequent 2, 3, 4, and 5-grams. The free n-grams from COHA (http://corpus.byu.edu/coha; 400 million words, 1810-2009) contain the frequency of every word, and every 2, 3, 4, and 5-gram that occurs at least three times in the corpus, along with its frequency in each of the 20 decades (1810s-2000s). Other versions of the n-grams include *all* 2, 3, and 4-grams from COCA (e.g. 155 million 3-grams). This n-grams data is in addition to the other COCA-based word frequency and collocates data that is available from http://www.wordfrequency.info.
One advantage of the COCA and COHA n-grams over the Google n-grams (both contemporary and historical datasets) is that the COCA / COHA n-grams are tagged for part of speech (as well as lemma, for some of the COCA datasets), and that they are based on genre-balanced corpora. In addition, it is easier to install and use these n-grams on a wide variety of platforms, since the n-grams are smaller than the billions of rows of data in the Google datasets (but still large enough to hopefully be quite useful).
Anyway, for those who might be interested -- http://www.ngrams.info.
Mark Davies Brigham Young University
============================================ Mark Davies Professor of (Corpus) Linguistics Brigham Young University (phone) 801-422-9168 / (fax) 801-422-0906
** Corpus design and use // Linguistic databases ** ** Historical linguistics // Language variation ** ** English, Spanish, and Portuguese ** ============================================
_______________________________________________ UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora Corpora mailing list Corpora at uib.no http://mailman.uib.no/listinfo/corpora