[Corpora-List] Reducing n-gram output

maxwell at umiacs.umd.edu maxwell at umiacs.umd.edu
Tue Oct 28 15:11:14 CET 2008


Justin Washtell wrote:
> ...If you start at the character level, rather than the word level,
> then you get morphological analysis for free!

Well, morphological analysis is a little more complicated than that :-). For one thing, there are plenty of very common substrings that are not morphemes.

A lot of work has been done on learning morphology from corpora; one place to start is with the work by John Goldsmith and his students on Linguistica.

Mike Maxwell

CASL/ U MD



More information about the Corpora mailing list