[Corpora-List] SemEval-2022 Task 11: Multilingual Complex Named Entity Recognition - Third Call for ParticipationThird call

Sudipta Kar sudipta.kar.8080 at gmail.com
Mon Jan 3 20:25:05 CET 2022


Hi,

We invite you to participate in SemEval-2022 Task 11: *Multi*lingual *Co* mplex *N*amed *E*ntity *R*ecognition (MultiCoNER).

*Task Website:* https://multiconer.github.io/ *Codalab (Data download + Submission):* https://competitions.codalab.org/competitions/36044

This task focuses on the detection of complex entities, such as movie, book, music and product titles, in low context settings (short and uncased text). The task covers 3 domains (sentences, search queries, and questions) and provides data in 11 languages: *English, Spanish, Dutch, Russian, Turkish, Korean, Farsi, German, Chinese, Hindi*, and *Bangla*.

Here are some examples in English, Chinese, Bangla, Hindi, Russian, Korean, and Farsi, where entities are enclosed inside brackets with their type:

- the original *[ferrari daytona | **PRODUCT]* replica driven by *[don

johnson | **PERSON]* in *[miami vice | **CreativeWork]*

- 它 的 座 位 在 [*圣 **布 **里 **厄 *| *LOCATION]* .

- স্টেশনটির মালিক [*টাউনস্কেয়ার **মিডিয়া* | *CORPORATION]* ।

- यह [*कनेल **विभाग *| *LOCATION*] की राजधानी है।

- в основе фильма — стихотворение [*г. сапгира* | *PERSON**]* .

- [*블루레이 **디스크 *| *PRODUCT]* : 광 기록 방식 저장매체의 하나

- [*نینتندو* | *CORPORATION]* / [*باندای* *نامکو* *انترتینمنت* |

*CORPORATION]* – [*برادران* *سوپر* *ماریو* *نهایی* | *CreativeWork]*

Additionally, a *multilingual NER track* is also offered for multilingual systems that can process all languages. A *code-mixed track* allows participants to build systems that process inputs with tokens coming from two languages. For example, the following are some code-mixed examples from Turkish, Spanish, Dutch, German, and English.

- it was produced at the [*soyuzmultfilm* | *GROUP]* studio in [*moskova*

| *LOCATION]* .

- [*arturo vidal* | *PERSON]* ( born 1987 ) , professional footballer

playing for [*fútbol club barcelona* | *GROUP]*

- daarmee promoveerde hij toen naar de [*premier league* | *CORPORATION]*

.

- piracy has been a part of the [*sultanat von sulu* | *LOCATION]* culture

.

The task focuses on detecting semantically ambiguous and complex entities in short and low-context settings. Participants are welcome to build NER systems for any number of languages. And we encourage to aim for a bigger challenge of building NER systems for multiple languages. The task also aims at testing the domain adaptation capability of the systems by testing on additional test sets on questions and short search queries.

We have released training data for 11 languages along with a baseline system to start with. Participants can submit their system for one language but are encouraged to aim for a bigger challenge and build multi-lingual NER systems.

*Task Website:* https://multiconer.github.io/ *Codalab Submission site:* https://competitions.codalab.org/competitions/36044 *Mailing List:* multiconer-semeval at googlegroups.com *Slack Workspace:* https://join.slack.com/t/multiconer/shared_invite/zt-vi3g97cx-MpqTvS07XX22S78nRC2s0Q *Training Data:* https://multiconer.github.io/dataset *Baseline System:* https://multiconer.github.io/baseline

*Shared task schedule:*

- Training data ready: September 3, 2021

- Evaluation data ready: December 3, 2021

- Evaluation start: January 24, 2022

- Evaluation end: by January 28, 2022

- System description paper submissions due: February 23, 2022

- Notification to authors: March 31, 2022

*Task organizers*

- Shervin Malmasi (Amazon)

- Besnik Fetahu (Amazon)

- Anjie Fang (Amazon)

- Sudipta Kar (Amazon)

- Oleg Rokhlenko (Amazon)

Please reach out to the organizers at multiconer-semeval-organizers at googlegroups.com, or join the Slack workspace to connect with the other participants and organizers.

Thank you, - Sudipta Kar Applied Scientist Amazon Alexa AI +1 8326437277 http://sudiptakar.info -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 15530 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20220103/11e4049c/attachment.txt>



More information about the Corpora mailing list