DSL Shared Task at VarDial Workshop - COLING 2014 in Dublin, Ireland.
DSL Shared Task: http://corporavm.uni-koeln.de/vardial/sharedtask.html VarDial Workshop: http://corporavm.uni-koeln.de/vardial/
Discriminating between similar languages and language varieties is one of the bottlenecks of language identification. This aspect has been topic of a number of papers published in the last years. The DSL shared task aims to provide a dataset to evaluate system's performance on discriminating 13 different languages in 6 language groups.
We invite researchers and developers to participate. To receive the training data, please register before March 20th at: http://goo.gl/A3Dd49
The best systems will be invited to submit a short paper to appear in the VarDial workshop proceedings.
We will first provide a set of 20,000 instances per language (18,000 training + 2,000 development) in CSV format. Each instance is a full sentence extracted from journalistic corpora and written in one of the languages and tagged with the language group and country of origin. After one month we will release a test set containing 1,000 unidentified instances of each language to be classified according to the country of origin.
Group A (Bosnian, Croatian, Serbian) Group B (Brazilian Portuguese, European Portuguese) Group C (Indonesian, Malaysian) Group D (Czech, Slovakian) Group E (Peninsular Spain, Argentine Spanish) Group F (American English, British English)
We allow two kinds of submissions (please indicate this when you fill your registration form):
Closed submission: Using only the training corpus provided by the DSL shared task. Open submission: Using any corpus for training including the DSL one.
Training set release: March 20th, 2014 Test set release: April 21st, 2014 Submissions due: April 23rd, 2014 (23:59 EST) Results announced: April 30th, 2014 Short papers deadline: May 30th, 2014 Feedback: June 20th, 2014 Camera-ready versions: June 27th, 2014
Marcos Zampieri (Saarland University, Germany) Liling Tan (Saarland University, Germany) Nikola Ljubešić (University of Zagreb, Croatia) Jörg Tiedemann (Uppsala University, Sweden)
Shared Task: dsl.sharedtask at gmail.com Workshop: vardialworkshop at gmail.com