[Corpora-List] Uses of N-grams?

Hi Cedric

As we cannot be sure of the meaning or the part-of-speech of an item

from a word frequency list, are not n-grams a sort of halfway house

between word frequency lists and concordances?

To me, n-grams is just one of the tools in the corpus linguistics toolbag,

although it may be a relative newcomer, and hasn't grabbed the headlines

like keywords, perhaps.

If I remember correctly, at Cobuild, we first used bigrams for the BBC

dictionary (published in 1992). I don't think n-grams was a feature of

the earlier versions of WordSmith, and even in the more recent

AntConc, the n-grams option is slightly hidden.

Since the 1990s, I have used n-grams as a routine part of corpus

analysis, if they are available in the software I am using at the time,

for a variety of purposes (eg investigating language varieties in 'The

Globalization of Business English?' at Complex 2001; investigating

genre features in 'A corpus-based analysis of junk emails' at LREC

2002; and recently, to compare Business Spanish and Business French

in research for the COMENEGO project).

Access to Google n-grams seems to have sparked interest in studies

into historical changes in social, cultural, and political values?




Regarding n-grams (highly frequent word sequences like "on the other hand" or "why don't you"), does anybody any uses for them apart from language teaching.

Most literature dealing with n-grams seems to apply them to foreign language teaching, second language acquisition, or English for X purposes. Any other uses?

Best wishes,

