[Corpora-List] Chinese spelling bakeoff

Lung-Hao Lee lunghaolee at gmail.com
Wed Apr 16 15:34:23 CEST 2014


Thanks for your comments.

The incorrect characters are visually and/or phonologically similar. Please take the following real errors for more understanding. (1) "一身"(ㄕㄣ incorrect pronunciation) should be " 一生" (ㄕㄥ). (2) "一只"(ㄓˇ third tone) should be "一直"(ㄓˊ second tone). (3) "觀心"(ㄍㄨㄢ. correct pronunciation, but incorrect character) should be "關心" (4) "無會"(ㄨˊ second tone) should be "舞會"(ㄨˇ third tone). (5) " 一落千仗" (an Chinese idiom, correct pronunciation, but incorrect character) should be "ㄧ落千丈" (6) "聰是" (ㄘㄨㄥ ㄕˋ without this word, similar pronunciation and character) should be "總數" (ㄗㄨㄥˇ ㄕㄨˋ).

Best, Lung-Hao

On Wed, Apr 16, 2014 at 8:19 PM, Yunqing Xia <yqxia at tsinghua.edu.cn> wrote:


>
> Hi Simon,
>
> Your observation is interesting. There is indeed no 'spelling' in Chinese,
> because Chinese characters are 'drawn', rather than 'spelled'. To me, the
> term 'spelling' only applies to languages based on letters. But I believe
> the Chinese researchers can understand what 'Chinese spelling error' means
> though it is not appropriate.
>
> Instead, I think 'typo' is an appropriate term for the so-called 'spelling
> error' in Chinese. Typo represents an error in editing, which is output of
> various input approaches. Typo has nothing to do with meaning, but I am not
> sure whether spelling errors cover the meaning-level word misuse.
>
> cheers,
> Yunqing
>
>
>
> On 16 April 2014 18:18, Simon Smith <smithsgj at gmail.com> wrote:
>
>> > *Task Description*
>> > The goal of this task is to evaluate the capability of a Chinese
>> spelling
>> > checker. The passage consisting of several sentences with/without
>> spelling
>> > errors will be given as the input. The checker should return the
>> locations
>> > of incorrect characters and suggest the correct characters. Each
>> character
>> > or punctuation occupies one position for counting location. If the input
>> > contains no spelling errors, the system should return ?*pid, 0*?. If the
>> > input contains at least one spelling errors, the output format is ?*pid
>> [,
>> > location, correction]+*?.
>>
>>
>> Chinese doesn't have "spelling" as such, so I'm trying to figure out
>> what you are saying correct spelling in an alphabetic language
>> corresponds to in Chinese. For me, the closest analogy would mean
>> writing the character correctly: no strokes missing, or other
>> compositional errors.
>>
>> That can't be what you mean, though, since you're looking at
>> electronic input. In the essays, the characters cannot possibly have
>> missing strokes or compositional errors; the errors can only be in the
>> choice of character. If a student writes pengyou using youmeiyou de
>> you instead of pengyou de you, for example, is that a spelling error,
>> since the phonetic realization of the correct and incorrect characters
>> is the same? Or, if someone wrote yueliang de yue instead of peng,
>> replacing the correct character with one that *looks* like it, would
>> that count?
>>
>> Or is that any incorrect character counts as a spelling mistake? But
>> that's not a "spelling" issue, is it?
>>
>> (Does the last quoted line ( ?*pid) above show an example error in
>> Chinese? I don't think Chinese characters show up properly on corpora
>> list...)
>> ___________________________
>>
>>
>> Simon Smith, PhD
>> Senior Lecturer
>> Dept of English & Languages
>> Coventry University
>>
>> +44 2476 887 643
>>
>> http://www.linkedin.com/pub/simon-smith/42/b77/173
>>
>> http://tinyurl.com/simoncov
>>
>> _______________________________________________
>> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
>> Corpora mailing list
>> Corpora at uib.no
>> http://mailman.uib.no/listinfo/corpora
>>
>>
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>
>
-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 6799 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20140416/94c50877/attachment.txt>



More information about the Corpora mailing list