[Corpora-List] Call For Participation - DSL Shared Task at VarDial Workshop - COLING 2014 in Dublin, Ireland.

Marcos Zampieri marcos.zampieri at uni-koeln.de
Sat Mar 1 16:17:39 CET 2014

Call For Participation

DSL Shared Task at VarDial Workshop - COLING 2014 in Dublin, Ireland.

DSL Shared Task: http://corporavm.uni-koeln.de/vardial/sharedtask.html VarDial Workshop: http://corporavm.uni-koeln.de/vardial/

Discriminating between similar languages and language varieties is one of the bottlenecks of language identification. This aspect has been topic of a number of papers published in the last years. The DSL shared task aims to provide a dataset to evaluate system's performance on discriminating 13 different languages in 6 language groups.

We invite researchers and developers to participate. To receive the training data, please register before March 20th at: http://goo.gl/A3Dd49

The best systems will be invited to submit a short paper to appear in the VarDial workshop proceedings.


We will first provide a set of 20,000 instances per language (18,000 training + 2,000 development) in CSV format. Each instance is a full sentence extracted from journalistic corpora and written in one of the languages and tagged with the language group and country of origin. After one month we will release a test set containing 1,000 unidentified instances of each language to be classified according to the country of origin.

Group A (Bosnian, Croatian, Serbian) Group B (Brazilian Portuguese, European Portuguese) Group C (Indonesian, Malaysian) Group D (Czech, Slovakian) Group E (Peninsular Spain, Argentine Spanish) Group F (American English, British English)

We allow two kinds of submissions (please indicate this when you fill your registration form):

Closed submission: Using only the training corpus provided by the DSL shared task. Open submission: Using any corpus for training including the DSL one.

Important Dates

Training set release: March 20th, 2014 Test set release: April 21st, 2014 Submissions due: April 23rd, 2014 (23:59 EST) Results announced: April 30th, 2014 Short papers deadline: May 30th, 2014 Feedback: June 20th, 2014 Camera-ready versions: June 27th, 2014


Marcos Zampieri (Saarland University, Germany) Liling Tan (Saarland University, Germany) Nikola Ljubešić (University of Zagreb, Croatia) Jörg Tiedemann (Uppsala University, Sweden)


Shared Task: dsl.sharedtask at gmail.com Workshop: vardialworkshop at gmail.com

More information about the Corpora mailing list