Are you also tired of shared tasks where the largest finetuned transformer-based model wins on yet another classification or sequence labeling task?, consider participating in Multi-LexNorm!
We have datasets for 12 language(-pairs), with a varying amount of training data: Croatian, Danish, Dutch, English, German, Indonesian-English, Italian, Serbian, Slovenian, Spanish, Turkish, Turkish-German.
The data-freeze is now in effect. We have incorporated feedback from participants and tried to homogenize the annotations from the different datasets. This means that unless serious issues are found, the data is now final. The data can be found on: https://bitbucket.org/robvanderg/multilexnorm/
The highest scoring (macro-average of the Error Reduction Rate over all languages) open-source system will be the official winner, but we also provide a downstream evaluation of dependency parsing for selected languages.
More information is available on: http://noisy-text.github.io/2021/multi-lexnorm.html
The remaining timeline is as follows: Test data available: Aug 25, 2021 Final Evaluation: Sep 1, 2021 Paper deadline: Sep 22, 2021 Reviews: Oct 1, 2021 Camera ready: Oct 8, 2021 Workshop: Nov 11, 2021
Best, The organizing committee: Rob van der Goot Barbara Plank Alan Ramponi Tommaso Caselli Nikola Ljubešić Timothy Baldwin Özlem Çetinoglu Benjamin Muller Talha Çolakoğlu Arkaitz Zubiaga Iñaki San Vicente Roncal Wladimir Sidorenko -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 5685 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20210721/eb4436e5/attachment.txt>