It seems like for every conceivable NLP task there is some agreed-upon evaluation data set. Or at least one that is used in at least several papers. Now, for some strange reason I seem to be utterly unable to find any such test set for the spell checking task!
Am I doing something wrong or is there no such data set? I know I can make synthetic tests systematically inserting, swapping etc. letters in my own test data, but this would give me results which I cannot compare to any other results. Hence, is there some accepted evaluation forum which I am missing because whenever I include spell check in any form in search queries I get lots of tutorials how to write a spellchecker and almost nothing else...
Best regards, Stefan Bordag