[Corpora-List] First Call for EMNLP Workshop on Arabic Natural Language Processing & Shared Task on Automatic Arabic Error Correction

Nizar Habash habash at ccls.columbia.edu
Tue Mar 25 22:04:40 CET 2014


First Call for Papers and Participation EMNLP Workshop on Arabic Natural Language Processing Including Shared Task on Automatic Arabic Error Correction

Apologies for multiple postings

Please distribute to colleagues


First Call for Papers and Participation

Arabic Natural Language Processing Workshop collocated with EMNLP 2014, Doha, Qatar

Workshop date: Saturday October 25, 2014 Paper submission deadline: July 26, 2014 Shared task registration deadline: July 1, 2014


==================== WORKSHOP DESCRIPTION ====================

There has been a lot of progress in the last 15 years in the area of Arabic Natural Language Processing (NLP). Many Arabic NLP (or Arabic NLP-related) workshops and conferences have taken place, both in the Arab World and in association with international conferences, e.g., the conference on Arabic Language Resources and Tools (MEDAR-2009, NEMLAR-2004), the workshop on Computational Approaches to Semitic Languages (LREC 2010, EACL 2009, ACL 2007, ACL 2005, ACL 2002, ACL 1998), the workshop on Computational Approaches to Arabic Script-based Languages (MTSummit XII 2009, LSA 2007, COLING 2004), the International Symposium on Computer and Arabic Language (ISCAL 2009, ISCAL 2007), the Colloque International sur le Traitement Automatique de la Langue Arabe (CITALA 2007), the International Symposium on Processing of Arabic (Tunisia 2002), the workshop on Arabic Language Resources and Evaluation (LREC 2002), and the workshop on Arabic Language Processing (ACL -2001), among others. This workshop proposal follows in the footsteps of these efforts to provide a forum for researchers to share and discuss their ongoing work. This workshop is timely given the continued rise in research projects focusing on Arabic NLP in the Arab World and the West.

We invite submissions on topics that include, but are not limited to, the following:

* Basic core technologies: morphological analysis, disambiguation,

tokenization, POS tagging, named entity detection, chunking,

parsing, semantic role labeling, sentiment analysis, Arabic dialect

modeling, etc.

* Applications: machine translation, speech recognition, speech

synthesis, optical character recognition, pedagogy, assistive

technologies, social media, etc.

* Resources: dictionaries, annotated data, specialized databases etc. Submissions may include work in progress as well as finished work. Submissions must have a clear focus on specific issues pertaining to the Arabic language whether it is standard Arabic, dialectal, or mixed. Descriptions of commercial systems are welcome, but authors should be willing to discuss the details of their work. Submissions are expected to be 8 pages long plus 2 pages for references. Associated with the workshop will be a shared task on Arabic text error correction (details below).

=========== SHARED TASK ===========

As part of the Arabic Natural Language Processing Workshop at EMNLP 2014 (to be held in Doha, Qatar), we will conduct a shared task on Automatic Arabic Error Correction. We designed this task in the traditions of high profile shared tasks in natural language processing such as CONLLŐs grammar/error detection and correction shared tasks in 2011-2013 and numerous machine translation campaigns by NIST/WMT/MEDAR, among others. The task relies on resources created under the Qatar Arabic Language Bank (QALB) project (currently over 1M words of manually corrected Arabic text). A participating system in this shared task will be given Modern Standard Arabic texts, which are to be automatically corrected. The provided input will be provided in Arabic script and in a standard Romanization scheme, and will be annotated for part-of-speech (in three different granularities), clitics (which appear in 20% of Arabic words), lemmas, English glosses, and dependency tree relations. All of the input text will be preprocessed in a common way to make sure all participants have access to all of these features at no additional overhead novelty cost. An XML format will be used to encode all of this information. A participating system then returns a corrected version of the Arabic text that is one sentence per line in an XML format. The task is focused on correction as opposed to identification. There will not be an error identification task per se. Participants need to register. Once registered, all participating teams will be provided with a common training data set, which includes common preprocessed input and corrected output. A common development set will also be provided. A blind test data set will be used to evaluate the output of the participating teams. An evaluation script will be provided to all the teams. Participants are expected to author a short paper (4 pages + 2 for references) describing their approach, resources and experiments. The paper needs to follow the standard format of EMNLP conference.

=============== IMPORTANT DATES ===============

Shared task registration period: April8, 2014 through July 1, 2014 Shared task test release: July 7, 2014 Shared task system output collection: July 18, 2014 Submission deadline (Workshop and shared task papers): July 26, 2014 Author notification: August 26, 2014 Camera Ready: September 15, 2014 Workshop: October 25, 2014

========== ORGANIZERS ==========

Program Co-chairs Nizar Habash, Columbia University Stephan Vogel, Qatar Computing Research Institute

Publication Co-chairs Nadi Tomeh, Paris 13 University Houda Bouamor, Carnegie Mellon University Qatar

Website Committee Kareem Darwish, Qatar Computing Research Institute Noura Farra, Columbia University

Shared Task Committee Behrang Mohit, Carnegie Mellon University Qatar Alla Rozovskaya, Columbia University Wajdi Zaghouani, Carnegie Mellon University Qatar Ossama Obeid, Carnegie Mellon University Qatar Nizar Habash, Columbia University (advisory)

Program Committee Members (TBA in Second Call) -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 7797 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20140325/d5c59c47/attachment.txt>

More information about the Corpora mailing list