In conjunction with EACL 2017, Valencia, Spain Sponsored by SIGSLAV: Special Interest Group on Slavic Natural Language Processing of the ACL
Submission deadline: 20 January 2017 (anywhere in the world) Notification of acceptance: 11 February 2017 Camera-ready papers due: 21 February 2017 Workshop: 3 or 4 April 2017
THEME and MOTIVATION
Languages from the Balto-Slavic group play an important role due to their diverse cultural heritage and widespread use -- with over 400 million speakers worldwide. The recent political and economic developments in Central and Eastern Europe have brought Balto-Slavic societies and their languages into focus in terms of rapid technological advancement and rapidly expanding consumer markets.
This Workshop addresses Natural Language Processing (NLP) for the Balto-Slavic languages. The NLP tasks in urgent need of attention include, but are not limited to: - morphological analysis and generation, - morphosyntactic tagging, - syntactic and semantic parsing, - lexical semantics, - named-entity recognition, - text normalisation and processing non-standard language - coreference resolution, - information extraction, - question answering, - information retrieval, - text summarization, - machine translation, - development of linguistic resources.
Research on theoretical and applied topics in the context of some of the Balto-Slavic languages is still in its early stages. The linguistic phenomena specific to Balto-Slavic languages -- such as rich morphological inflection and free word order -- make the construction of NLP tools for these languages a challenging and intriguing task.
The goal of this Workshop is to bring together researchers from academia and industry working on NLP for Balto-Slavic languages. In particular, the Workshop will serve to stimulate the research on NLP techniques for Balto-Slavic languages, and to foster the creation of tools and resources for these languages. The Workshop will provide a forum for exchanging ideas and experience, discussing difficult-to-tackle problems, and making the resources that are available more widely-known. One fascinating aspect of this sub-family of languages is the striking structural similarity, as well as an easily recognizable core vocabulary and inflectional inventory spanning the entire group of languages -- despite a lack of mutual intelligibility -- which creates a special environment in which researchers can fully appreciate the shared problems and solutions and communicate naturally.
There will be two types of submissions: long papers and short papers.
Long papers should describe original, unpublished and completed work. Short papers should describe: (a) work in progress and/or small focused contributions, or (b) system demonstrations, new linguistic resources and experience of using existing software and resources, or (c) ongoing projects and activities that are relevant to all stakeholders in the domain of Balto-Slavic NLP.
Overlap with previously published work should be clearly mentioned. The authors should indicate along with their submission if the paper has been submitted elsewhere. In case the paper is rejected by the main conference, it should be indicated in the submission.
All submissions will be judged on correctness, novelty, technical strength, clarity of presentation, usability, and significance/relevance to the Workshop. Submissions will be reviewed by at least three members of the Program Committee.
The reviewing of long papers will be blind. Therefore, long papers should not include the authors' names and affiliations. Self-citations and other references that reveal the authors' identity must be avoided.
In particular, submissions describing systems, resources, or solutions that are made available to the wider public would be strongly encouraged, as this would help to promote computational linguistics applications for Balto-Slavic languages.
Long paper submissions should follow the two-column format of EACL 2017 proceedings not exceeding eight (8) pages of content plus two (2) additional pages for references. Short paper submissions should follow the same format, and should not exceed four (4) pages for content plus two (2) additional pages for references. Submissions must conform to the official style guidelines of EACL 2017, which are contained in the style files (http://eacl2017.org/images/site/eacl-2017-template.zip), and must be in PDF.
Camera-ready versions of accepted papers must be provided both in LaTeX and PDF format.
For the first time at BSNLP, we are envisioning a shared task on Named Entity Recognition and lemmatization in heterogeneous and multilingual collections of Web documents in Slavic languages. The Joint Research Centre of the European Commission will provide a corpus consisting of sets of links to web documents, where each such set covers Web documents that are related to one specific entity and contains information in several Slavic languages. The participants will be tasked to test their multilingual techniques for named entity recognition and named entity lemmatization, the latter generasslly being particularly difficult for Slavic languages due to rich inflection, free word order, derivation and other phenomena. This area is highly relevant for the development of Entity Linking, which in turn enables multilingual and cross-lingual information access, semantic processing based on knowledge graphs, etc.
The participants of the shared task will be invited to submit the description of their solutions and experience as short papers.
Željko Agić (University of Copenhagen, Denmark) Tomaž Erjavec (Jozef Stefan Institute, Slovenia) Katja Filippova (Google, Zurich, Switzerland) Darja Fišer (University of Ljubljana, Slovenia) Radovan Garabik (Comenius University in Bratislava, Slovakia) Goran Glavaš (University of Mannheim, Germany) Maxim Gubin (Facebook Inc., USA) Miloš Jakubíček ( Masaryk University, Brno, Czech Republic Tomas Krilavičius (Vytautas Magnus University, Kaunas, Lithuania) Cvetana Krstev (University of Belgrade, Serbia) Vladislav Kubon (Charles University, Prague, Czech Republic) Nikola Ljubešić (Jožef Stefan Institute, Ljubljana, Slovenia) Olga Mitrofanova (St. Petersburg State University, Russia) Preslav Nakov (Qatar Computing Research Institute, Qatar) Maciej Ogrodniczuk (Polish Academy of Sciences, Poland) Petya Osenova (Bulgarian Academy of Sciences, Bulgaria) Maciej Piasecki (Wroclaw University of Technology, Poland) Jakub Piskorski (Joint Research Centre, Ispra, Italy/PAS, Warsaw, Poland) Lidia Pivovarova (University of Helsinki/St.Petersburg State University, Russia) Alexandr Rosen (Charles University, Prague) Tanja Samardžić (University of Geneva, Switzerland) Agata Savary (University of Tours, France) Kiril Simov (Bulgarian Academy of Sciences, Bulgaria) Inguna Skadina (University of Latvia, Latvia) Jan Šnajder (University of Zagreb, Croatia) Serge Sharoff (University of Leeds, UK) Josef Steinberger (University of West Bohemia, Czech Republic) Stan Szpakowicz (University of Ottawa, Canada) Hristo Tanev (Joint Research Centre, Italy) Irina Temnikova (Qatar Computing Research Institute, Qatar) Roman Yangarber (University of Helsinki, Finland) Marcin Woliński (Polish Academy of Sciences, Warsaw, Poland) Daniel Zeman (Charles University, Czech Republic)
Tomaž Erjavec, Jožef Stefan Institute, Slovenia Jakub Piskorski, Joint Research Centre of the European Commission, Ispra, Italy Lidia Pivovarova, University of Helsinki, Finland Jan Šnajder, University of Zagreb, Croatia Josef Steinberger, University of West Bohemia, Czech Republic Roman Yangarber, University of Helsinki, Finland