[Corpora-List] Uses of N-grams?

Kilian Evang maschinenraum at texttheater.net
Thu Jul 18 10:35:26 CEST 2013

Dear Cedric,

n-grams are also widely used in computational linguistics to process natural language automatically, based on statistical information.

A simple example is a language model: given the beginning of a sentence, which word is likely to appear next? Counting n-grams in a large corpus can help predict this. To some extent, this can be used for fluency ranking, i.e. automatically assessing how "natural" a text sounds that is produced e.g. by a language learner or by a machine translation system.

Another example is part-of-speech tagging: the word "cap" can be either a verb or a noun, but the context should disambiguate it. The n-grams as part of which the word appears may provide such context.

Best, Kilian

More information about the Corpora mailing list