[Corpora-List] Call for Participation: The MADAR Shared Task on Arabic Fine-Grained Dialect Identification - Colocated With WANLP and ACL 2019

Wajdi Zaghouani wajdiz at gmail.com
Wed Jan 9 17:29:24 CET 2019

(apologies for cross-posting)

==== Call for Participation ====

The MADAR Shared Task on Arabic Fine-Grained Dialect Identification - Colocated with The 4th Arabic Natural Language Processing Workshop (WANLP 2019 <http://wanlp2019.arabic-nlp.net/>) and ACL 2019 in Florence, Italy (August 1, 2019).

Website: https://sites.google.com/view/madar-shared-task/ Registration Link: https://docs.google.com/forms/d/e/1FAIpQLSe3zUMW_gWY6oHU9QHqkxN_QRAgx3Z8kY8MCaYrrfMBlZPkzQ/viewform

Introduction Arabic dialect identification is the task of automatically labeling a segment of speech or text with the dialect it comes from. Most of previous work and shared tasks on dialect identification focused on regional level dialect labeling as in efforts by Zaidan and Callison-Burch, Elfardy and Diab, and the VarDial ADI evaluation campaign. This shared task will be the first to target a large set of dialect labels at the city and country levels. The data for the shared task is created or collected under the Multi-Arabic Dialect Applications and Resources (MADAR) project.

Shared Task There are two subtasks in this shared task.

Subtask 1: MADAR Travel Domain Dialect Identification. The data of this subtask is the same reported on in the following papers.

Bouamor, H., Habash, N., Salameh, M., Zaghouani, W., Rambow, O., et al. (2018). The MADAR Arabic Dialect Corpus and Lexicon. In Proceedings of the 11th International Conference on Language Resources and Evaluation. (PDF: http://www.lrec-conf.org/proceedings/lrec2018/pdf/351.pdf)

Salameh, M., Bouamor, H. & Habash, N. (2018). Fine-Grained Arabic Dialect Identification. In Proceedings of the 27th International Conference on Computational Linguistics. (PDF: http://aclweb.org/anthology/C18-1113)

Subtask 2: MADAR Twitter User Dialect Identification. This is a new data set created for this shared task.

Metrics: The evaluation metrics will include precision/recall/f-score/accuracy in addition to a new hierarchical evaluation metric designed for Arabic dialects. Macro Averaged F-score will be the official metric.

Participants need to register using the registration link below. All participating teams will be provided with a common training data set and a common development set. No external manually labelled data sets are allowed. A blind test data set will be used to evaluate the output of the participating teams. An evaluation script will be also provided to all the teams. All teams are required to report on the development and test set in their write-ups.

The shared task will be hosted through CODALAB (Links TBD).

Registration Link: https://docs.google.com/forms/d/e/1FAIpQLSe3zUMW_gWY6oHU9QHqkxN_QRAgx3Z8kY8MCaYrrfMBlZPkzQ/viewform

IMPORTANT DATES December 10, 2018: First announcement of the shared task

January 7, 2019: Announcement of shared task website and the beginning of registration

January 28, 2019: Release of initial training data and scoring script

March 18, 2019: Final training data release

April 29, 2019: Registration deadline

May 6, 2019: Test set made available

May 13, 2019: Systems' outputs collected

May 27, 2019: Shared task system paper submissions due

June 17, 2019: Notification of acceptance

June 24, 2019: Camera-ready version of shared task system papers due

August 1, 2019: ACL 2019 Workshop in Florence

TASK ORGANISERS Houda Bouamor (Fortia Financial Solutions, France) Sabit Hasan (Carnegie Mellon University Qatar, Qatar) Nizar Habash (New York University Abu Dhabi, UAE)

CONTACT For any questions related to this task, please post to this google group, or contact the organizers directly using the following email address: madar.shared.task at gmail.com


*Wajdi Zaghouani, Ph.D.*

*Assistant Professor* College of Humanities and Social Sciences

P.O. Box 34110 | Education City | Doha, Qatar tel: +974 4454 5601 | mob: +974 33454992

wzaghouani at hbku.edu.qa| Office A141, LAS Building -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 12571 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20190109/0e79a16a/attachment.txt>

More information about the Corpora mailing list