[Corpora-List] Last call for participation - CMCL 2022 Shared Task on Multilingual and Crosslingual Prediction of Human Reading Behavior

Nora Hollenstein nora.hollenstein at gmail.com
Fri Jan 14 13:23:13 CET 2022

*CMCL 2022 Shared Task: Multilingual and crosslingual prediction of human reading behavior*

The benefits of eye movement data for machine learning have been assessed in various domains, including NLP and computer vision. Eye tracking provides millisecond-accurate records on where humans look when they are reading and are useful in explanatory research of language processing. Eye movements depend on the stimulus and are therefore language-specific but there are universal tendencies which remain stable across languages (Liversedge et al., 2016). Modelling human reading has been researched extensively in psycholinguistics (e.g., Reichle et al., 1998; Matthies & Søgaard, 2013; Hahn & Keller, 2016; Sood et al., 2008). Being able to accurately predict eye-tracking features across languages will advance this field and will facilitate comparisons between models and the analysis of their varying capabilities.

In this shared task we address the challenge of predicting eye-tracking features recorded during sentence processing of multiple languages. We are interested in both cognitive modelling approaches as well as linguistically motivated approaches (i.e., language models).

There are two major changes compared to the CMCL 2021 Shared Task on eye-tracking prediction: - Multilingual data: We provide a dataset compiled from eight openly available eye movement corpora with sentences from six languages (Chinese, Dutch, English, German, Hindi, Russian). - Eye-tracking features: To take into account the individual differences between readers, the task is not limited to predict the mean eye tracking features across readers, but also the standard deviation of the feature values.

* Task objective * The shared task if formulated as a regression task to predict 2 eye-tracking features and the corresponding standard deviation across readers: (1) first fixation duration (FFD), the duration of the first fixation on the prevailing word; (2) standard deviation of FFD across readers (3) total reading time (TRT), the sum of all fixation durations on the current word, including regressions; (4) standard deviation of TRT across readers

* Subtasks * - Subtask 1 - Multilingual prediction: Predict eye-tracking features for sentences of the 6 provided languages - Subtask 2 - Crosslingual prediction: Predict eye-tracking features for sentences from a new surprise language

* Data * Eye-tracking data during natural reading from 8 datasets in 6 languages.

* Platform * CodaLab: https://competitions.codalab.org/competitions/36415

* Timeline * - December 6, 2021: Trial data release - December 20, 2021: Training data (& dev data) release - January 19, 2022: Participant registration deadline & Test data release - February 1, 2022: Submission deadline for system predictions - February 6, 2022: Results release - February 28, 2022: Submission deadline for systems description papers - March 26, 2022: Notification of Acceptance - April 10, 2022: Camera-ready papers due - May 26-28, 2022: CMCL Workshop (@ACL)

* Contact * cmclsharedtask at gmail.com

* Papers * Participants are expected to submit short system description papers (4 pages + references), which will be published in the workshop proceedings. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 3759 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20220114/33dd71bd/attachment.txt>

More information about the Corpora mailing list