[Corpora-List] BSNLP-2017 Shared Task on multi-lingual named entity recognition for Slavic languages

Lidia Pivovarova lidia.pivovarova at gmail.com
Wed Jan 18 11:03:48 CET 2017


CALL FOR PARTICIPATION *Multilingual named entity recognition for Slavic languages*

Shared task Website: <http://bsnlp-2017.cs.helsinki.fi/shared_task.html> http://bsnlp-2017.cs.helsinki.fi/shared_task.html Sponsored by ACL SIGSlav <http://sigslav.cs.helsinki.fi/> Email contact <bsnlp at cs.helsinki.fi>

The ACL Special Interest Group on Slavic NLP invites participation in the Shared Task on multilingual named entity recognition for Slavic languages *. *Results of the shared task will be presented at the BSNLP-2017 Workshop, to be held at EACL-2017 <http://eacl2017.org> in Valencia Spain, on 4 April, 2017.

The task aims at recognizing mentions of named entities in web documents in Slavic languages, their normalization / lemmatization, and cross-language matching. Due to rich inflection, free word order, derivation and other phenomena exhibited by Slavic languages, the detection of names and their lemmatization poses a challenging task. Fostering research and development on this problem—and the closely related problem of entity linking—is of paramount importance for enabling multilingual and cross-lingual information access.

*The shared task initially covers seven languages: - Croatian, - Czech, - Polish, - Russian, - Slovak, - Slovene, - Ukrainian and focuses on recognition of four types of named entities including: - persons, - locations, - organizations, and - miscellaneous, where the last category covers mentions of all other types of named entities, e.g., products, events, etc. This is the first edition of the task, and it is intended to be expanded to additional entity types and languages in the future. The task focuses on cross-lingual document-level extraction of named entities, i.e., the systems should recognize, classify, and extract all named entity mentions in a document, but detecting the position of each named entity mention in text is not required.* *Data*

*The input text collection consists of sets of documents from the Web, each collection revolving around a certain entity. The corpus was obtained by posing a query to a search engine and parsing the HTML of relevant documents.* The training data consists of two sets of about 200 documents each.

Registered participants will receive the full corpora and further information via email directly after registration.

The test data set will be provided to registered participants in February and will be in the same format, i.e., the content of each collection will be focused on one particular entity. Please see the Section on Important Dates for further information. The format used will be exactly the same as for training data.

Detailed information about data formats, rules for entity types, system response guidelines, evaluation metrics and procedure, publication of results, and on-going updates about the Shared Task will be announced on BSNLP 2017 web page at: http://bsnlp-2017.cs.helsinki.fi/shared_task.html and on the mailing list of SIGSLAV at: https://groups.google.com/forum/?fromgroups#!forum/sigslav

Timeline

* 12 December 2016 Shared task announcement and release of training/trial data 12 December 2016 First Call for Participation 21 December 2016 Second Call for Participation 10 January 2017 Final Call for Participation 16 January 2017 Deadline for submission of system papers (not mandatory) 11 February 2017 Release of blind test data for registered participants 12 February 2017 Notification of acceptance of system papers 13 February 2017 Announcement of the results of the evaluation to participants 21 February 2017 Camera-ready system papers due (including the received results of the evaluation) 4 April 2017 BSNLP 2017 workshop* -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 23745 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20170118/9121ea87/attachment.txt>



More information about the Corpora mailing list