[Corpora-List] corpora of grammatical errors

Dr CK Jung c.k.jung at gmail.com
Mon Apr 16 14:29:32 CEST 2012

Hi Anabela

This very interesting indeed…

My name is CK Jung at Corpus Lab, Yonsei University in Korea. I have recently finished developing a one-million-word written corpus of Korean learners of English, called Yonsei English Learner Corpus (YELC).


Having spent almost seven extremely 'intensive' months, I was totally exhausted. However I was just wondering if it would be possible to try your 'linguistically-sophisticated grammar checker' with my corpus.

Please let me know if that is okay. I will send you some sample texts (I have got texts from 9 different levels).

All best CK P.S. Hi Ramesh, it's good to see you on the list and I hope all is well. I hope to see you again in Seoul. BTW, I have just published a new Korean-language introduction to corpus linguistics, together with Professor Heok-Seung Kwon of Seoul National University.

On 16 April 2012 20:43, Anabela Barreiro <barreiro_anabela at hotmail.com> wrote:
> This is funny and for fun... :)
> I (unshamefully) do not properly restrain in self assertion of being a good
> proficient second language writter (even near-native writter, in my most
> egocentric moments ;) )
> This is an example of on how my previous e-mail to the list would be
> corrected/improved by a linguistically-sophisticated grammar checker
> (smarter than I was when I wrote my message):
> -----------------
> Dear Corpora-List Members, I would like to thank all who have sent me
> personal e-mails with suggestions, including indication on where to find
> corpora for languages other than English and the Romance languages.
> In reply to Ramesh,
> I would say that they all contain sentences with grammatical errors. I am
> interested in corpora with sentences that have errors demonstrating
> particular aspects of the grammar (prepositions, verb tenses, negation,
> coordination, etc., etc., etc.) with some pre-selection and
> pre-categorization of the ungrammaticality of the sentences. In the past,
> system developers used what were called "test suites", mostly fabricated by
> linguists for the specific purpose of testing a particular system, which
> included files with ungrammatical sentences. I am interested in sentences
> that come from "real" usage of language by non-native speakers, and from
> native speakers with writing difficulties or writing texts where language
> and style is not optimized and needs to be improved. When supporting editing
> of a text, existing grammar checkers are not sophisticated enough to
> identify all the grammar problems and often identify as a problem perfectly
> correct sentences (false positives and false negatives). In addition to
> correction, there is also the potential for providing better solutions for
> writing (including more categories to the typology)... For example, I can
> fix support verb constructions with "weak" verbs into semantically "strong"
> verbs, which gives the text a more professional style, eliminates words that
> are unnecessary, helps texts being translated more efficiently by humans and
> machines, etc.
> From my request on this list, I found out that there is an ongoing shared
> task concerned with the automated correction of errors in text by Robert
> Dale and Adam Kilgarriff :
> http://clt.mq.edu.au/research/projects/hoo/
> This is an especially interesting task because it groups errors into
> linguistic categories. Hoo already includes preposition and determiner
> errors in exam scripts authored by learners of English as a Second Language,
> but their goal is to enlarge the typology of linguistic errors. That's all I
> wished for :)
> -----------------
> Have a good day!
> Anabela.


--- Best regards Dr CK Jung

Senior Research Fellow Corpus Lab, Department of English Language and Literature, Yonsei University, 50 Yonsei-ro, Seoul, 120-749, South Korea Tel (Direct): +82 (0)2 2123 7516 Fax: +82 (0)2 362 2381 Email: ckjung at yonsei.ac.kr http://web.yonsei.ac.kr/yonseicorpuslab

External Academic Staff (MA Thesis Supervisor & Tutor) Center for English Language Studies Department of English University of Birmingham Edgbaston, Birmingham, B15 2TT, UK Tel: +44 (0)121 414 5695 Fax: +44 (0)121 414 3298 Email: c.k.jung at bham.ac.uk

> Columnist, Monthly Chosun (the largest monthly politics and news magazine in South Korea) http://monthly.chosun.com/client/column/list.asp?C_CC=C&tbKey=C.K.Jung
> Advisory Committee, Asia Pacific Corpus Linguistics Association, New Zealand
> Referee, Language and Intercultural Communication (Listed in the Thomson Reuters Social Sciences Citation Index (SSCI)).
> Referee, The English Linguistics Society of Korea (Listed in the Korea Citation Index (KCI)).
> Referee, The Linguistic Science Society (Listed in the Korea Citation Index (KCI)).
> Planning Director, The Applied Linguistics Association of Korea (ALAK), South Korea.
> Information Technology Director, The Korea Association of Multimedia-Assisted Language Learning (KAMALL), South Korea

More information about the Corpora mailing list