[Corpora-List] Spellchecker evaluation corpus

Stefan Bordag sbordag at informatik.uni-leipzig.de
Sat Apr 9 10:45:00 CEST 2011

Hi everyone,

It seems like for every conceivable NLP task there is some agreed-upon evaluation data set. Or at least one that is used in at least several papers. Now, for some strange reason I seem to be utterly unable to find any such test set for the spell checking task!

Am I doing something wrong or is there no such data set? I know I can make synthetic tests systematically inserting, swapping etc. letters in my own test data, but this would give me results which I cannot compare to any other results. Hence, is there some accepted evaluation forum which I am missing because whenever I include spell check in any form in search queries I get lots of tutorials how to write a spellchecker and almost nothing else...

Best regards, Stefan Bordag

More information about the Corpora mailing list