[Corpora-List] New COCA-based resource: www.wordandphrase.info

Mark Davies Mark_Davies at byu.edu
Mon Jan 2 15:53:56 CET 2012

The following might be of interest to those who use corpora for language teaching and learning, and perhaps for those interested in lexicography. For those interested in using corpora to teach English for Academic Purposes (EAP), you might take a look at the note at the end.


We have just released a new interface for the 425 million word Corpus of Contemporary American English (COCA):


Even more so than the standard COCA interface (which will continue to be available), the new website is designed to provide information on many different aspects of a word and its usage -- all on one screen. Users can browse through the frequency listing (lemmas 1-60,000 in the corpus) or look for specific words, and then for any matching words they can see:

-- the definition(s) of the word (based on WordNet) -- the overall frequency in the 425 million word corpus, and its rank (1-60,000) -- the frequency in each of the five main genres -- spoken, fiction, magazines, newspapers, and academic -- 20-30 collocates, which of course provide useful insight into meaning and usage -- 200 concordance lines (re-sortable), which provide insight into the patterns in which the word occurs -- synonyms (grouped by meaning and sorted by frequency); can click to see the entries for related words -- WordNet entries, showing related words with a more specific or a more general meaning

As noted, all of this information is displayed together on one screen, with extensive links from one word to another. For example, you can click on any of the 20-30 collocates or any word in the concordance lines, to generate a new concordance display for a specific node/collocate pair. Or you can click on any of the synonyms or the WordNet entries to generate a new display, and thus follow a "chain" of related words.

If you are interested in English words and their frequency, genre distribution, meaning, the relationship to related words, and the patterns in which a word occurs, we believe that this new resource will be quite useful for you in your teaching, learning, and research. And as always, it is available for free -- no annual subscription fees for individuals or institutions.

As a final note, we might mention that in the next month or two we'll be releasing a related resource -- a special version of www.wordandphrase.info that is oriented to English for Academic Purposes (EAP). Same functionality as above, but limited to just the 85 million words of academic texts in COCA. Based on words with a much higher frequency in the 85 million words of academic texts in COCA than in other genres, with frequency by academic sub-genre (medical, legal, education, social sciences, humanities, etc), and all collocates and concordance lines limited to just the academic genre.

============================================ Mark Davies Professor of Linguistics / Brigham Young University http://davies-linguistics.byu.edu/

** Corpus design and use // Linguistic databases ** ** Historical linguistics // Language variation ** ** English, Spanish, and Portuguese ** ============================================

More information about the Corpora mailing list