8th Workshop on NLP4CALL, NoDaLiDa 2019

NoDaLiDa, Turku, Finland, 30 September 2019

Workshop website: https://spraakbanken.gu.se/eng/research-icall/8th-nlp4call Twitter: @NLP4CALL

Call for participation

We invite you to participate in the workshop on NLP for Computer-Assisted Language Learning. The program will soon be published on the workshop webpage. To register, please visit the NoDaLiDa conference site: https://nodalida2019.org/

The workshop series on Natural Language Processing for Computer-Assisted Language Learning (NLP4CALL) is a meeting place for researchers working on the integration of Natural Language Processing and Speech Technologies in CALL systems and exploring the theoretical and methodological issues arising in this connection. The latter includes, among others, insights from Second Language Acquisition (SLA) research, on the one hand, and promote the development of “Computational SLA” through setting up Second Language research infrastructure(s), on the other.

The intersection of Natural Language Processing (or Language Technology / Computational Linguistics) and Speech Technology with Computer-Assisted Language Learning (CALL) brings “understanding” of language to CALL tools, thus making CALL intelligent. This fact has given the name for this area of research – Intelligent CALL, ICALL. As the definition suggests, apart from having excellent knowledge of Natural Language Processing and/or Speech Technology, ICALL researchers need good insights into second language acquisition theories and practices, as well as knowledge of second language pedagogy and didactics. This workshop invites therefore a wide range of ICALL-relevant research, including studies where NLP-enriched tools are used for testing SLA and pedagogical theories, and vice versa, where SLA theories, pedagogical practices or empirical data are modeled in ICALL tools. The NLP4CALL workshop series is aimed at bringing together competencies from these areas for sharing experiences and brainstorming around the future of the field.

We welcomed papers: - that describe research directly aimed at ICALL; - that demonstrate actual or discuss the potential use of existing Language and Speech Technologies or resources for language learning; - that describe the ongoing development of resources and tools with potential usage in ICALL, either directly in interactive applications, or indirectly in materials, application or curriculum development, e.g. learning material generation, assessment of learner texts/responses, individualized learning solutions, provision of feedback; - that discuss challenges and/or research agenda for ICALL; - that describe empirical studies on language learner data.

This year, a special focus is given to the established and upcoming infrastructures aimed at SLA and learner corpus research, covering questions such as data collection, legal issues, reliability of annotation, annotation tool development, search environments for SLA-relevant data, etc.

Invited speakers

This year, we have the pleasure to welcome two invited speakers: Thomas François (Université catholique de Louvain) and Egon Stemle (Eurac research).

In his talk entitled "Assessing language complexity for L2 readers with NLP techniques and corpora", Thomas François will summarize the main trends regarding the automatic assessment of language complexity for L2 readers and focus on three research projects. To illustrate the readability approach, the DMesure project will be presented. It is the first computational readability formula specialized for readers of French as a foreign language. Secondly, the talk will discuss the use of corpora to assess language complexity through CEFRLex, an international project providing, for some of the main European languages, lexical resources describing the frequency distributions of words across the six levels of competence of the Common European Framework of Reference for Languages (CEFR). These distributions have been estimated on corpora of pedagogical materials intended for L2 purposes such as textbooks and simplified readers.

Egon Stemle, in his talk entitled "Towards an infrastructure for FAIR language learner corpora", will investigate CMC corpora, which resemble language learner corpora in some core aspects, with regard to their compliance with the FAIR principles and discuss to what extent the deposit of research data in repositories of data preservation initiatives such as CLARIN, Zenodo or META-SHARE can assist in the provision of FAIR corpora. Second, he will show some modern software technologies and how they make the process of software packaging, installation, and execution and, more importantly, the tracking of corpora throughout their life cycle reproducible. This in turn makes changes to raw data reproducible for many subsequent analyses.


David Alfter (1), Elena Volodina (1), Ildikó Pilán (2), Herbert Lange (3), Lars Borin (1) (1) Språkbanken, University of Gothenburg (2) City University of Honk Kong and University of Oslo (3) Department of Computer Science and Engineering, University of Gothenburg and Chalmers University of Technology, Sweden


David Alfter, david.alfter at gu.se

For further information, please see the workshop webpage

David Alfter, Doctoral Researcher

Språkbanken, Department of Swedish University of Gothenburg Box 200 SE-405 30 Gothenburg +46 (0)31 786 4543

