[Corpora-List] Workshop “Cross-lingual analysis and annotation of parallel and comparable corpora: Current and future trends” (23 November 2018, Université Paris Diderot)

mzimina mzimina at eila.univ-paris-diderot.fr
Sat Jul 28 10:10:05 CEST 2018


Consortium CORLI https://corli.huma-num.fr

Project Group: Multilingual and plurilingual corpora

Workshop “Cross-lingual analysis and annotation of parallel and comparable corpora: Current and future trends” (23 November 2018, Université Paris Diderot)

Workshop coordinators: Natalie Kübler (Paris Diderot), Maria Zimina (Paris Diderot), Evangelia Adamou (CNRS), and Antonio Balvet (Université de Lille)

There are methodological uncertainties about the practice of annotation related to multilingual corpora and the potential impacts of the relatively new tools and methods of simultaneous cross-lingual analysis of parallel and comparable corpora (including LSP corpora, code-switching corpora, multilingual speech treebanks, etc.). Research on multilingual and plurilingual corpora faces many challenges as most tools and software available for specific languages or language pairs differ from each other in important ways, such as annotation methods, available tagsets, or interaction workflow scenarios. These discrepancies complicate building a robust framework for a multilingual system of analysis. We are delighted to invite academics, practitioners, representatives of civil society or anyone interested in research on multilingual language data analysis to contribute to this CORLI Workshop. Our aim is to bring together research and researchers from a wide variety of theoretical backgrounds and disciplines, to encourage discussion and address the following issues:

Workshop objectives - Identify issues that need to be addressed in order to make informed decisions on multilingual text and speech processing. - Share hands-on experience in designing, using and evaluating multilingual and plurilingual corpora within specific linguistic projects (dealing with terminology, phraseology, translation, discourse analysis, code-switching, etc.). - Understand and construct frameworks for multi-level annotation and cross-lingual analysis of parallel and comparable corpora explaining how specific tools and methods contribute to specific research objectives. - Understand how to develop and implement cross-lingual analysis of multilingual corpora using natural language processing, qualitative and quantitative analyses. - Understand and be able to select units of analysis that are appropriate to the processing tools and methods. - Understand how to disseminate and use annotated aligned corpora in several languages.

Short bibliography

Çetinoğlu, Özlem, Sarah Schultz & Thang Vu. (2016). Challenges of computational processing of code-switching. In Proceedings of the Second Workshop on Mona Diab, Pascale Fung, Mahmoud Ghoneim, Julia Hirschberg & Thamar Solorio (eds.) Computational Approaches to Code Switching, Austin, Texas, 1–11. Association for Computational Linguistics.

Guzmán, Gualberto A., Jacqueline Serigos, Barbara E. Bullock & Almeida J. Toribio. (2016). Simple tools for exploring variation in code-switching for linguists. In Mona Diab, Pascale Fung, Mahmoud Ghoneim, Julia Hirschberg & Thamar Solorio (eds.), Proceedings of the Second Workshop on Computational Approaches to Code Switching, Austin, Texas, 12–20. Association for Computational Linguistics.

Sailer, Manfred & Stella Markantonatou (eds). (2018). Multiword Expressions: Insights from a Multi-lingual Perspective. Language Science Press, Berlin.

Sharoff, Serge. (2018). Language adaptation experiments via cross-lingual embeddings for related languages. In Proc LREC, Miyazaki, Japan, May 2018.

Rehm, Georg, Daniel Stein, Felix Sasaki & Andreas Witt (2018). Language technologies for a multilingual Europe. Translation and Multilingual Natural Language Processing. Language Science Press, Berlin.

Tiedemann, Jörg. (2011). Bitext Alignment, Synthesis Lectures on Human Language Technologies. San Rafael, Morgan & Claypool Publishers.

Tiedemann, Jörg (2017). Cross-Lingual Dependency Parsing for Closely Related Languages - Helsinki's Submission to VarDial 2017. CoRR abs/1708.05719 (2017).

Zweigenbaum, Pierre, Serge Sharoff & Reinhard Rapp. (2018). A Multilingual Dataset for Evaluating Parallel Sentence Extraction from Comparable Corpora. In Proc LREC, Miyazaki, Japan, May 2018.

Program committee members Evangelia Adamou (CNRS), Nicolas Ballier (Paris Diderot), Antonio Balvet (Université de Lille), Geneviève Bordet (Paris Diderot), Chris Gledhill (Université Paris Diderot), Nicolas Frœliger (Paris Diderot), Clive Hamilton (Paris Diderot), Olivier Kraïf (Université Grenoble Alpes), Natalie Kübler (Paris Diderot), Alexandra Mestivier (Paris Diderot), Mathieu Valette (INALCO), Maria Zimina (Paris Diderot).

Day of workshop: 23 November 2018

Invited speakers (to be confirmed)

Workshop website address: TBD

Application procedures and deadlines*

Persons who wish to take part in the CORLI Workshop should send a short proposal (approximately 500 words) and a short CV to:

Natalie Kübler: nkubler at eila.univ-paris-diderot.fr Maria Zimina: mzimina at eila.univ-paris-diderot.fr

Fung, Mahmoud Ghoneim, Julia Hirschberg & Thamar Solorio (eds.),

Members of CORLI who wish to attend the workshop without presenting a paper should send a short statement of interest to: Natalie Kübler: nkubler at eila.univ-paris-diderot.fr Maria Zimina: mzimina at eila.univ-paris-diderot.fr Application deadline: 1 October 2018 *With funding available for a limited number of participants, preference will go to students and researchers who failed to obtain funding from their home organizations.



More information about the Corpora mailing list