With an increasing number of papers in Natural Language Processing and Computational Linguistics being authored by non-native English speakers (NNSs), we think it's time the community provided more support for those authors. As a field that works on computational techniques for processing text, we're in a better position than most to do something useful; so, the aim of this shared task-called HOO, for 'Helping Our Own'-is to promote the use of NLP tools and techniques to help improve the textual quality of papers written by NNSs in the field. Of course, we're not being rigidly inward-looking-techniques developed here will be useful for authors in other disciplines too; but we figure that this approach will get us maximum traction.
The task offers opportunities for researchers working in a wide range of NLP areas: spell checking, grammar checking, style checking, paraphrasing, machine translation, text compression, text simplification ... the possibilities are endless, with techniques developed for quite different purposes having potential to assist. Participating teams can choose to focus on specific subsets of errors and corrections, or to try to achieve universal repair.
The initial development data set is under construction, and will be released soon; this consists of 1000-word excerpts of text from real papers that have been graciously contributed to the project by their authors, each subsequently marked-up with corrections. An initial sample paper that gives a flavour of the kinds of corrections we're dealing with and the way in which they are marked up is available from the HOO website at http://www.clt.mq.edu.au/research/projects/hoo/. Please visit the site to register your interest and to be added to a mailing list for project updates.
More information about the aims of the project can be found in the following paper:
R Dale and A Kilgarriff  Helping Our Own: Text Massaging for Computational Linguistics as a New Shared Task. In Proceedings of the 6th International Natural Language Generation Conference, 7th-9th July 2010, Dublin, Ireland. [http://www.clt.mq.edu.au/research/projects/hoo/files/2010_DaleKilg_INLG_HOO .pdf]
The schedule for this initial pilot run of the shared task is as follows:
14 April : HOO launched 30 April: Announce and public release of scripts and development data 10 June: Evaluation data available to participants 10 July: Latest date for return of corrected scripts 31 July: Announce results Aug, Sept: Participants prepare their system descriptions and error analyses 28-30 Sept: Workshop (with ENLG, Nancy, France)
We look forward to your participation in this exercise!
Robert Dale and Adam Kilgarriff