[Corpora-List] CFPs: Fourth Workshop on Computational Approaches to Linguistic Code-Switching

Thamar Solorio thamar.solorio at gmail.com
Tue Jan 21 20:41:05 CET 2020

[Apologies for cross-listing]

Call for Papers

Code-switching (CS) is the phenomenon by which multilingual speakers switch back and forth between their common languages in written or spoken communication. CS is typically present on the intersentential, intrasentential (mixing of words from multiple languages in the same utterance) and even morphological (mixing of morphemes) levels. CS presents serious challenges for language technologies such as Parsing, Machine Translation (MT), Automatic Speech Recognition (ASR), information retrieval (IR) and extraction (IE), and semantic processing. Traditional techniques trained for one language quickly break down when there is input mixed in from another language. Even for problems that are considered solved for specific domains and languages, such as language identification, or part of speech tagging, performance degrades at a rate proportional to the amount and level of the mixed-language present.

This workshop aims to bring together researchers interested in solving the problem and increase community awareness of the possible viable solutions to reduce the complexity of the phenomenon. The workshop invites contributions from researchers working in NLP approaches for the analysis and processing of mixed-language data especially with a focus on intrasentential code-switching. Topics of relevance to the workshop will include the following:


Development of linguistic resources to support research on code-switched



NLP approaches for language identification in code-switched data


NLP approaches for named entity recognition in code-switched data


NLP techniques for the syntactic analysis of code-switched data


NLP techniques for higher level tasks on code-switched data, such as

Q&A, language understanding, grounding


Domain/dialect/genre adaptation techniques applied to code-switched data



Language modeling approaches to code-switched data processing


Crowdsourcing approaches for the annotation of code-switched data


Machine translation approaches for code-switched data


Multimodal approaches to processing code switched data


Application of low resource processing paradigms to code switch



Position papers discussing the challenges of code-switched data to NLP



Methods for improving ASR in code switched data


Survey papers of NLP research for code-switched data


Sociolinguistic aspects of code-switching


Sociopragmatic aspects of code-switching


This year we propose a theme for the workshop around resources and evaluation metrics and frameworks. The goal of the theme is to disseminate more broadly the data sets that are available for the research community, and to engage the community in a discussion about adopting best practices and common frameworks to enable a comprehensive evaluation of technology for code-switched data. We welcome submissions responsive to the theme, in addition to the topics listed above.

Important Dates:

Paper submission: February 20th, 2020

Notification of acceptance: March 16th, 2020

Camera ready submission deadline: April 5th, 2020

Invited Speakers:

Alan W. Black, Carnegie Mellon University

Organizing Committee:

Thamar Solorio

Associate Professor

Department of Computer Science

University of Houston

thamar.solorio at gmail.com

Research interests: syntactic analysis of code-switched data, information extraction for social media data, analysis of style in text, detection of objectionable content online

Monojit Choudhury

Principal Researcher

Microsoft Research Lab India

monojitc at microsoft.com

Research interests: computational processing of code-switched text, NLP for low resource languages, computational sociolinguistics and pragmatics.

Kalika Bali

Principal Researcher

Microsoft Research Lab India

kalikab at microsoft.com

Research interests: computational processing of code-switched text and speech, NLP for low resource languages, computational sociolinguistics.

Sunayana Sitaram

Senior Researcher

Microsoft Research Lab India

sunayana.sitaram at microsoft.com

Research interests: computational processing of code-switched spoken language, speech processing for low-resource languages, speech and language systems for multilingual communities

Amitava Das

Lead Scientist

Wipro AI Lab India

amitava.das2 at wipro.com

Research interests: Code-Mixing, Social Computing, Conversational System

Mona Diab

Principal Scientist

Amazon AWS AI

Professor of Computer Science, GWU, USA

diabmona at amazon.com

Research Interests: Code Switching, Low Resource Scenarios, Conversational AI

Workshop website:


Contact workshop organizers:

codeswitching_workshop at googlegroups.com

Program Committee:

Gustavo Aguilar, University of Houston

Barbara Bullock, University of Texas at Austin

Özlem Cetinoglu, University of Stuttgart

Hila Gonen, Bar Ilan University

Sandipan Dandapat, Microsoft

A. Seza Doğruöz, Google Research

William H. Hsu, Kansas State University

Constantine Lingos, Brandeis University

Rupesh Mehta, Microsoft

Joel Moniz, Carnegie Mellon University

Adithya Pratapa, Carnegie Mellon University

Yihong Theis, Kansas State University

Jacqueline Toribio, University of Texas at Austin

Gentra Inda Winata, Hong Kong University of Science and Technology -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 36886 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20200121/a7a76773/attachment.txt>

More information about the Corpora mailing list