[Corpora-List] Reducing n-gram output

J Washtell lec3jrw at leeds.ac.uk
Tue Oct 28 23:18:31 CET 2008

Quoting maxwell at umiacs.umd.edu:

> Justin Washtell wrote:
>> ...If you start at the character level, rather than the word level,
>> then you get morphological analysis for free!
> Well, morphological analysis is a little more complicated than that :-).
> For one thing, there are plenty of very common substrings that are not
> morphemes.

Yes, and language is nothing if not exceptions, but it is a remarkably good start for such a simple rule. It is telling that many of the participants of MorphoChallenge etc, do take a "compactness" approach, with considerable success - see Creutz & Lagus (2006) "Morfessor". But yes - point taken - "morphological analysis for free" should really come with health warnings.

Justin Washtell University of Leeds

More information about the Corpora mailing list