[Corpora-List] Spellchecker evaluation corpus

Trevor Jenkins trevor.jenkins at suneidesis.com
Thu Apr 14 13:34:37 CEST 2011


On Thu, 14 Apr 2011, Stefan Bordag <sbordag at informatik.uni-leipzig.de> wrote:


> I imagine, however, that it wouldn't be conceptually difficult to set up
> a test that covers most or all of these needs you mentioned. A proper
> evaluation setup for spellchecking in general would consist of:
> - ... misspelled words along with a defined context
> - ... source of error ...
> - ... string pairs (wrong to correct) ...
> - ... spell checkers that need training data ...
> - ... resource usage ...
> - ... different languages ..

You have omitted, at least, one other issue. Namely the longitudinal changes to spelling *conventions*. Easily demonstrated in English by considering the word spellings in Shakespeare, the King James Version of the Bible, the novels of Jane Austen. Their works contains words whose then accepted spellings are not use today. And undoubtedly one can find for other languages similar historical literature with differing spellings. There is the forward version in that contemporary spellings conventions may not be considered correct at some future date.

Regards, Trevor

<>< Re: deemed!



More information about the Corpora mailing list