Dear Robert

The relevance of n-grams cannot be established simply by looking at the list. Manual examination of instances (with sufficient co-text) is needed. Why don't you select n-grams using the same criteria you used for selecting 1-grams?

Bondi, Marina & Mike Scott (eds.). 2010. Keyness in Texts. Amsterdam/Philadelphia: John Benjamins Publishing Company. 251 pp. ISBN 978-90272-8766-3.<https://journals.openedition.org/asp/4932> All the best, Rodrigo


We are comparing a reference corpus and a target corpus in order to identify keywords and key phrases on a particular topic that is prominent in the target purpose but not in the reference corpus. We use log ratio and statistical significance in order to identify candidates for keywords, i.e. 1-grams, and then go through the rest manually in order to identify those that are relevant to the topic at hand (e.g. unemployment and labour relations). We remove items that are not relevant, for example if there was a random event like a particular sports tournament during the period of the target corpus.

In addition, we are looking at n-grams with n greater 1 and and we're not sure how to decide which n-grams are relevant. For example, “unemployment causes poverty” is certainly relevant. On the other hand, “unemployment is” or “the unemployed are” or “unemployment causes” are not relevant.

I would be interested in hearing about any established practices about how to distinguish relevant from non-relevant n-grams, or more generally any thoughts on how this can be done in a principled way other than making ad hoc decisions.

A solution we have considered so far is to exclude n-grams that only consist of function words in addition to a single content word that we already identified as a relevant keyword/1-gram. Other than this simple solution, we were wondering if there are more advanced approaches to this problem.

Mailing list on varieties of English/World Englishes/ENL-ESL-EFL. Subscribe here: https://groups.google.com/forum/#!forum/var-eng/join Are you a non-native speaker of English? Please help us by taking this short survey on when and how you use the English language: https://lamapoll.de/englishusageofnonnativespeakers-1/

