Sincerely,
Rich Cooper
EnglishLogicKernel.com
Rich AT EnglishLogicKernel DOT com
9 4 9 \ 5 2 5 - 5 7 1 2
-----Original Message----- From: corpora-bounces at uib.no [mailto:corpora-bounces at uib.no] On Behalf Of John F. Sowa Sent: Sunday, April 10, 2011 7:06 PM To: corpora at uib.no Subject: Re: [Corpora-List] Spellchecker evaluation corpus
On 4/9/2011 7:03 AM, Eric Atwell wrote:
> Jennifer Pedler's PhD developed a spelling-error detection tool,
> evaluated on a corpus of real spelling errors;
Her slides had an example from a UK corpus that would be highly
unlikely in the US: {tort, taught}.
Japanese English is much better than it used to be, but it still
has L/R confusions.
Instead of a single corpus, it would be useful to have a set of
corpora for authors with different backgrounds.
Voila! The USPTO patent corpus has lots of examples, from lots of authors, in lots of technical fields, where jargon could be detected.
-Rich
John Sowa
-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 6867 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20110410/9983921b/attachment.txt>