[Corpora-List] Reducing n-gram output

J Washtell lec3jrw at leeds.ac.uk
Tue Oct 28 23:18:31 CET 2008


Quoting maxwell at umiacs.umd.edu:


> Justin Washtell wrote:
>> ...If you start at the character level, rather than the word level,
>> then you get morphological analysis for free!
>
> Well, morphological analysis is a little more complicated than that :-).
> For one thing, there are plenty of very common substrings that are not
> morphemes.

Yes, and language is nothing if not exceptions, but it is a remarkably good start for such a simple rule. It is telling that many of the participants of MorphoChallenge etc, do take a "compactness" approach, with considerable success - see Creutz & Lagus (2006) "Morfessor". But yes - point taken - "morphological analysis for free" should really come with health warnings.

Justin Washtell University of Leeds



More information about the Corpora mailing list