[Corpora-List] 1st CfP: NLPTEA 2018 Shared Task for Chinese Grammatical Error Diagnosis

Lung-Hao Lee lunghaolee at gmail.com
Mon Feb 26 09:52:41 CET 2018

------------------------------------------------------------ ---------------------------------------------------

The 5th Workshop on Natural Language Processing Techniques for Educational Applications (*NLPTEA 2018*) with a Shared Task for Chinese Grammatical Error Diagnosis (*CGED*)

July 19, 2018 at Melbourne, Australia (in conjunction with *ACL 2018*)

NLPTEA 2018: *https://sites.google.com/view/nlptea2018/ <https://sites.google.com/view/nlptea2018/>*

------------------------------------------------------------ ---------------------------------------------------

*Call for Participation *

*NLPTEA 2018 Shared Task: Chinese Grammatical Error Diagnosis* https://sites.google.com/view/nlptea2018/shared-task


Participants need to register in order to obtain the training and test data. To register, please send the following information to *Gaoqi Rao* ( raogaoqi at blcu.edu.cn)

- *Team Name* (identified abbreviation of your organization)

- *Organization* (affiliation)

- *Contact person* (name and Email)

*Task Description *

The goal of this shared task is to develop NLP techniques to automatically diagnose (including correction, at least partially) grammatical errors in Chinese sentences written by CFL learners. Such errors are defined as *redundant words* (denoted as a capital “*R*”), *missing words*(“*M*”), *word selection errors*(“*S*”), and *word ordering errors* (“*W*”). The input sentence may contain one or more such errors. The developed system should indicate which error types are embedded in the given sentence and the position at which they occur. Each input sentence is given a unique sentence number “sid”. If the inputs contain no grammatical errors, the system should return: “*sid, correct*”. If an input sentence contains the grammatical errors, the output format should include four items “*sid, start_off, end_off, error_type [, correction1, correction2, correction3]*”, where start_off and end_off respectively denote the positions of starting and ending character at which the grammatical error occurs, and error_type should be one of the defined errors: “R”, “M”, “S”, and “W”. Each character or punctuation mark occupies 1 space for counting positions. In this year, we will start to implement the correction recommendation into the task. *For errors of missing words (“M”) and word selection (“S”), systems are required to recommend at most 3 corrections*. If one of the corrections of these instances is identical to gold standard, the instances will be regarded as correct cases. Example sentences and corresponding notes are shown as follows.

- *Example 1*

Input: (sid=00038800481) 我根本不能了解这妇女辞职回家的现象。在这个时代,为什么放弃自己的工作,就回家当家庭主妇?

Output: 00038800481, 6, 7, S, 理解

00038800481, 8, 8, R

(Notes: “了解” should be “理解”. In addition, “这” is a redundant word.)

- *Example 2*

Input: (sid=00038800464)我真不明白。她们可能是追求一些前代的浪漫。Output: 00038800464, correct

- *Example 3*

Input: (sid=00038801261)人战胜了饥饿,才努力为了下一代作更好的、更健康的东西。

Output: 00038801261, 9, 9, M, 能

00038801261, 16, 16, S, 做

(Notes: “能” is missing. The word “作” should be “做”. The correct sentence

is “才能努力为了下一代做更好的”)

- *Example 4*

Input: (sid=00038801320)饥饿的问题也是应该解决的。世界上每天由于饥饿很多人死亡。

Output: 00038801320, 19, 25, W

(Notes: “由于饥饿很多人” should be “很多人由于饥饿”)

*Data Sets*

The learner corpora used in our shared task were taken from the writing section of the *Hanyu Shuiping Kaoshi* (*HSK, Test of Chinese Level*). Native Chinese speakers were trained to manually annotate grammatical errors and provide corrections corresponding to each error. The data were then split into two mutually exclusive sets as follows.

- *Training Set*:

All units in this set were used to train the grammatical error

diagnostic systems. Each unit contains 1 to 5 sentences with annotated

grammatical errors and their corresponding corrections. All units are

represented in SGML format, with correction annotation of error types of

missing words and word selection.

- *Test Set*:

This set consists of testing sentences used for evaluating system

performance. About half of these sentences are correct and do not contain

grammatical errors, while the other half include at least one error. The

distributions of error types are similar with that of the training set.


In addition to the data sets provided, participating research teams were allowed to use other public data for system development and implementation. Use of other data should be specified in the final system report. Here are the links to download the data sets of the previous editions for this shared task.

- *IJCNLP 2017 CGED Shared Task*:


- *NLPTEA 2016 CGED Shared Task*:


- *NLPTEA 2015 CGED Shared Task*:


- *NLPTEA 2014 CGED Shared Task*:


*Evaluation Metrics*

For performance evaluation, TP (True Positive) is the number of sentences with grammatical errors are correctly identified by the developed system; FP (False Positive) is the number of sentences in which non-existent grammatical errors are identified as errors; TN (True Negative) is the number of sentences without grammatical errors that are correctly identified as such; FN (False Negative) is the number of sentences with grammatical errors which the system incorrectly identifies as being correct.

The criteria for judging correctness are determined at three levels as follows.

- *Detection-level*:

Binary classification of a given sentence, that is, correct or

incorrect, should be completely identical with the gold standard. All error

types will be regarded as incorrect.

- *Identification-level*:

This level could be considered as a multi-class categorization problem.

All error types should be clearly identified. A correct case should be

completely identical with the gold standard of the given error type.

- *Position-level*:

In addition to identifying the error types, this level also judges the

occurrence range of the grammatical error. That is to say, the system

results should be perfectly identical with the quadruples of the gold


- *Correction-level*:

For error types of word selection and word missing, systems are required

to recommend at most 3 correction at each error. This level will judge the

error correction of the systems.

The following metrics are measured at detection/identification/position-level with the help of the confusion matrix.

- *False Positive Rate* = FP / (FP+TN)

- *Accuracy* = (TP+TN) / (TP+FP+TN+FN)

- *Precision* = TP / (TP+FP)

- *Recall* = TP / (TP+FN)

- *F1* = 2*Precision*Recall / (Precision + Recall)

*Important Dates*

- Registration open: February 6, 2018

- Release of training data: February 26, 2018

- Registration close: April 20, 2018

- Release of test data: April 25, 2018

- Testing results submission due: April 27, 2018

- Release of evaluation results: April 30, 2018

- Technical report submission due: May 14, 2018

- Report reviews returned: May 21, 2018

- Camera-ready due: May 28, 2018

- Workshop dates: July 19, 2018

*Workshop Organizers*

- Yuen-Hsien Tseng, National Taiwan Normal University

- Hsin-Hsi Chen, National Taiwan University

- Vincent Ng, The University of Texas at Dallas

- Mamoru Komachi, Tokyo Metropolitan University

*Shared Task Organizers*

- Gaoqi Rao, Beijing Language and Culture University

- Lung-Hao Lee, National Taiwan Normal University

- Baolin Zhang, Beijing Language and Culture University

- Endong Xun, Beijing Language and Culture University

- Liang-Chih Yu, Yuan Ze University

-- Lung-Hao Lee (李龍豪), Ph.D. Postdoctoral Fellow & Adjunct Assistant Professor Graduate Institute of Library and Information Studies National Taiwan Normal University http://www.lhlee.net/ -- -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 39734 bytes Desc: not available URL: <https://www.uib.no/mailman/public/corpora/attachments/20180226/52fa9a1a/attachment.txt>

More information about the Corpora mailing list