[Corpora-List] Spellchecker evaluation corpus

Rich Cooper rich at englishlogickernel.com
Mon Apr 11 04:25:56 CEST 2011

Comments below,


Rich Cooper


Rich AT EnglishLogicKernel DOT com

9 4 9 \ 5 2 5 - 5 7 1 2

-----Original Message----- From: corpora-bounces at uib.no [mailto:corpora-bounces at uib.no] On Behalf Of John F. Sowa Sent: Sunday, April 10, 2011 7:06 PM To: corpora at uib.no Subject: Re: [Corpora-List] Spellchecker evaluation corpus

On 4/9/2011 7:03 AM, Eric Atwell wrote:

> Jennifer Pedler's PhD developed a spelling-error detection tool,

> evaluated on a corpus of real spelling errors;

Her slides had an example from a UK corpus that would be highly

unlikely in the US: {tort, taught}.

Japanese English is much better than it used to be, but it still

has L/R confusions.

Instead of a single corpus, it would be useful to have a set of

corpora for authors with different backgrounds.

Voila! The USPTO patent corpus has lots of examples, from lots of authors, in lots of technical fields, where jargon could be detected.


John Sowa

-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 6867 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20110410/9983921b/attachment.txt>

More information about the Corpora mailing list