[Corpora-List] Machine Translation and Spelling Correction

Linas Vepstas linasvepstas at gmail.com
Thu Dec 3 20:04:11 CET 2009


2009/12/3 Marcin Miłkowski <list-address at wp.pl>:
>
> For spell-checking, you can use ispell (a bit outdated), aspell (modern), or
> hunspell (good for complex compounding languages).

Naive use of spelling-checkers can quickly lead to garbage, and/or a combinatorial explosion. To paraphrase J sinclair -- it is wrong to consider spelling without also considering the lexical and syntactic context in which the spelling error is made.

Speaking from experience, I've found that running text through a spell-checker before doing any other processing mostly just damages the text. The best strategy seems to be to leave the mis-spelled word in place -- and add it to your NLP or machine-translation lexicon, which will "understand" enough of the syntactic/lexical environment to "do the right thing" with the mis-spelled word.

--linas



More information about the Corpora mailing list