[Corpora-List] Frequency lists (corrected)

Mark Davies Mark_Davies at byu.edu
Mon Feb 23 18:41:42 CET 2009

There are also frequency lists for American English (based on COCA -- a balanced corpus of nearly 400 million words), TIME Magazine (100m words, 1920s-2000s), Spanish (20m words, 1900s) and Portuguese (20m words, 1900s). Also available are n-grams for all of these languages (as well as for the BNC). See:


Also, later this year there will be a printed frequency dictionary from Routledge. It will include the top 5,000 lemmas in American English (from COCA), as well as the top 20-30 collocates of each of these lemma (grouped by PoS and function: subj/obj etc), as well as indications of genre-based variation, etc.

Mark Davies

============================================ Mark Davies Professor of (Corpus) Linguistics Brigham Young University (phone) 801-422-9168 / (fax) 801-422-0906 Web: davies-linguistics.byu.edu

** Corpus design and use // Linguistic databases ** ** Historical linguistics // Language variation ** ** English, Spanish, and Portuguese ** ============================================

More information about the Corpora mailing list