We organize an international bakeoff on *Chinese Spelling Check* in *CLP-2014 (Oct. 20-21 in Wuhan, China)*, which is the 3rd conference jointly organized by the Chinese Language Processing Society of China (*CIPS*) and the ACL Special Interest Group on Chinese Language Processing (*SIGHAN)*. You are welcome to participate our task.
For more information, kindly visit http://ir.itc.ntnu.edu.tw/clp2014/task2csc.html
*Introduction* The number of people learning Chinese as a Foreign Language (CFL) is booming in recent decades. This number is expected to become even larger for the years to come. However, unlike English learning environment where many learning techniques have been developed, tools to support CFL learners are relatively rare, especially those that could automatically detect and correct Chinese spelling and grammatical errors. For example, Microsoft Word has not yet supported these functions for Chinese, although it supports English for years. In this bakeoff, essays written by CFL learners were collected for developing automatic spelling checkers. The hope is that through such evaluation campaigns, more innovative computer-assisted techniques will emerge, more effective Chinese learning resources will be built, and the state-of-art NLP techniques will be advanced for the educational applications.
*Task Description* The goal of this task is to evaluate the capability of a Chinese spelling checker. The passage consisting of several sentences with/without spelling errors will be given as the input. The checker should return the locations of incorrect characters and suggest the correct characters. Each character or punctuation occupies one position for counting location. If the input contains no spelling errors, the system should return “*pid, 0*”. If the input contains at least one spelling errors, the output format is “*pid [, location, correction]+*”.
*Data Sets * The policy of our evaluation is an open test. Participants can employ any linguistic and computational resources to develop your spelling checker. For example, the datasets with gold standard annotation for spelling check bakeoff last year can be freely downloaded at http://ir.itc.ntnu.edu.tw/lre/sighan7csc.html for your reference. This year, we also provide passages of CFLs’ essays selected from the NTNU learner corpus for training purpose. The data will be released in SGML format shown as follows. In addition, at least 1000 testing passages selected to cover different complexities will be used for testing.
- Registration for Bakeoffs open: *2014-03-20*
- Training data released: *2014-05-01*
- Dry run (format validation): *2014-05-20*
- Registration for Bakeoffs close: *2014-06-30*
- Test data released: *2014-07-30 (18:00 Beijing Time)*
- Test result submission deadline: *2014-08-01 (18:00 Beijing Time)*
- Test result evaluation released: *2014-08-20*
- Evaluation report submission deadline: *2014-08-26*
- Evaluation report reviews return: *2014-09-01*
- Final evaluation report submission deadline: *2014-09-10*
- Main Conference: *2014-10-20/21 <2014-10-20%2F21>*
On behalf of co-organizers Liang-Chih Yu, Lung-Hao Lee, Yuen-Hsien Tseng, and Hsin-Hsi Chen -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 7837 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20140416/7799a072/attachment.txt>