[Corpora-List] Chinese spelling bakeoff

Simon Smith smithsgj at gmail.com
Wed Apr 16 12:18:49 CEST 2014



> *Task Description*
> The goal of this task is to evaluate the capability of a Chinese spelling
> checker. The passage consisting of several sentences with/without spelling
> errors will be given as the input. The checker should return the locations
> of incorrect characters and suggest the correct characters. Each character
> or punctuation occupies one position for counting location. If the input
> contains no spelling errors, the system should return ?*pid, 0*?. If the
> input contains at least one spelling errors, the output format is ?*pid [,
> location, correction]+*?.

Chinese doesn't have "spelling" as such, so I'm trying to figure out what you are saying correct spelling in an alphabetic language corresponds to in Chinese. For me, the closest analogy would mean writing the character correctly: no strokes missing, or other compositional errors.

That can't be what you mean, though, since you're looking at electronic input. In the essays, the characters cannot possibly have missing strokes or compositional errors; the errors can only be in the choice of character. If a student writes pengyou using youmeiyou de you instead of pengyou de you, for example, is that a spelling error, since the phonetic realization of the correct and incorrect characters is the same? Or, if someone wrote yueliang de yue instead of peng, replacing the correct character with one that *looks* like it, would that count?

Or is that any incorrect character counts as a spelling mistake? But that's not a "spelling" issue, is it?

(Does the last quoted line ( ?*pid) above show an example error in Chinese? I don't think Chinese characters show up properly on corpora list...) ___________________________

Simon Smith, PhD Senior Lecturer Dept of English & Languages Coventry University

+44 2476 887 643

http://www.linkedin.com/pub/simon-smith/42/b77/173

http://tinyurl.com/simoncov



More information about the Corpora mailing list