[Corpora-List] New COCA-based "Academic Vocabulary Lists"

Mark Davies Mark_Davies at byu.edu
Tue Aug 13 18:49:37 CEST 2013

For those on CORPORA who might be interested in Academic English - for teaching or learning - there are two new, free corpus-based resources that might be of interest to you. These resources are based on the 120 million words of academic texts in the Corpus of Contemporary American English<http://corpus.byu.edu/coca/> [COCA] (85 million words in academic journals and 25 million words in more academically-oriented magazine articles).

(Note: a previous version of these resources was made available a year or so ago, but the complete version - with all entries -- is now available, now that an article on these resources has been published in Applied Linguistics<http://applij.oxfordjournals.org/content/early/2013/08/02/applin.amt015.abstract.html?papetoc>.)

1. The site http://www.academicwords.info/ contains free COCA-based academic wordlists. There are important differences<http://www.academicwords.info/compare.asp> between these lists and the Academic Word List created by Coxhead (2000). The three sets of word lists, which have been created in conjunction with Prof. Dee Gardner of BYU, are:

-- Word families (SAMPLE<http://www.academicwords.info/samples/families.pdf>): The top 2,000 word families of academic English. Unlike the traditional Academic Word List, our word families contain separate entries for different parts of speech, so you know, for example, whether abstract is used more as a noun, verb, or adjective. The words are also color-coded to let you know whether the word is a "general" academic word, or whether it is a more "technical" one that occurs in just a few domains (e.g. Science or Law). And most importantly, the entries are listed in order of frequency, to help you focus more on words that you will actually see in the real world -- rather than just having a mass of unorganized words in each word family.

-- General "core" academic English (SAMPLE<http://www.academicwords.info/samples/general-core.pdf>): The Academic Vocabulary List (AVL) itself -- the top 3,000 words (lemmas) in COCA Academic (listed individually, rather than by word family)

-- Top 20,000 words (SAMPLE<http://www.academicwords.info/samples/allWords.pdf>) in the 120 million words of COCA-Academic, including 1) AVL words ("core" academic words), 2) domain-specific words (e.g. cell or moral), which are not in the AVL, and 3) non-academic words (e.g. try, good).

2. The site http://www.wordandphrase.info/academic/ is an interactive, searchable interface to the academic vocabulary lists. It has the same features as the general WordAndPhrase<http://www.wordandphrase.info/> site, but all of the data is based strictly on the 120 million words of academic English in COCA.

-- Frequency listing: Browse through these lists (including word families) to see detailed information (all on one screen, with extensive links between resources): definition, frequency by academic sub-genre (e.g. Medicine, Business, Humanities), synonyms, and collocates and concordance lines (based just on academic English).

-- Input texts: As with the general interface, you can input an entire text (such as a journal article, or an academic paper that you have written) and it will give you detailed information about the words and phrases in the text. You can download word lists based on your text, and you can click on phrases in your text to see related phrases from COCA.

We hope that these two new corpus-based resources on academic English will be of interest and value to you for teaching, learning, and research.


Mark Davies

Professor of Linguistics / Brigham Young University


** Corpus design and use // Linguistic databases **

** Historical linguistics // Language variation **

** English, Spanish, and Portuguese **

-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 7231 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20130813/656dfd97/attachment.txt>

More information about the Corpora mailing list