1st Call for Papers

== TL;DR ==

* The 3rd edition of the workshop on “Challenges in the management of large corpora” (CMLC-3) is collocated with Corpus Linguistics 2015 in Lancaster, UK

* Workshop date: 20.07, format: half-day + discussion

* We invite extended abstracts (up to 4 pages) as PDF, deadline: 22.03

* Workshop homepage with details:


== Audience and Topics ==

The third edition of CMLC will accompany Corpus Linguistics 2015 in Lancaster, and will be held on the 20th of July 2015. This half-day workshop will gather the leading researchers in the field of Language Resource creation and Corpus Linguistics, in order to provide a platform for an intensive exchange of expertise, results and ideas, in particular concerning the following topics:

* recent developments in ongoing web-as-corpus initiatives, national corpora, reference corpora, and other very large corpora

* evaluation and investigation of the properties of large corpora

* extraction, representation, and management of metadata

* virtualization / techniques for drawing and accessing stratified virtual corpora

* increasing the coverage of underrepresented strata

* legal issues including license models and license management

* acquisition and curation of large text archives from third parties

* legal and technological issues of corpora physically distributed over different locations

* system- and database architectures for very large semi-structured data sets

* heavily annotated corpora

* use of annotation standards for large data sets

* issues of interoperability and tool chaining

* interfaces for user-provided annotations

* quality control of annotations in large data sets

* dealing with efficient and scalable user interfaces

* effective querying of large corpora with multiple annotation layers

* effective techniques for analyzing corpus data

* strategies and techniques for maximizing recall and coping with large numbers of false positives

* visualization and other techniques that facilitate the linking between quantitative investigations and qualitative interpretations

* “put the computation near the data” as a strategy for dealing with IPR restrictions

* open-source software and open-data corpora strategies

* other issues that arise in the context of management of large datasets.

We invite extended abstracts (up to 4 pages standard size, references excluded) addressing some of the topics listed above.

A volume of proceedings is planned.

== Organising Committee ==

* Piotr Bański, Marc Kupietz, Harald Lüngen, Andreas Witt (IDS Mannheim)

* Hanno Biber, Evelyn Breiteneder (ICLTT Vienna)

== Programme Committee == (this is a list of the colleagues who have confirmed their participation so far)

* Damir Ćavar (Indiana University, Bloomington)

* Isabella Chiari (Sapienza University of Rome)

* Dan Cristea ("Alexandru Ioan Cuza" University of Iasi)

* Václav Cvrček (Charles University Prague)

* Mark Davies (Brigham Young University)

* Tomaž Erjavec (Jožef Stefan Institute)

* Alexander Geyken (Berlin-Brandenburgische Akademie der Wissenschaften)

* Andrew Hardie (Lancaster University)

* Serge Heiden (ENS de Lyon)

* Nancy Ide (Vassar College)

* Miloš Jakubíček (Lexical Computing Ltd.)

* Adam Kilgarriff (Lexical Computing Ltd.)

* Krister Lindén (University of Helsinki)

* Martin Mueller (Northwestern University)

* Nelleke Oostdijk (Radboud University Nijmegen)

* Christian-Emil Smith Ore (University of Oslo)

* Piotr Pęzik (University of Łódź)

* Uwe Quasthoff (Leipzig University)

* Paul Rayson (Lancaster University)

* Laurent Romary (INRIA, DARIAH)

* Roland Schäfer (FU Berlin)

* Serge Sharoff (University of Leeds)

* Mária Simková (Slovak Academy of Sciences)

* Jörg Tiedemann (Uppsala University)

* Dan Tufiş (Romanian Academy, Bucharest)

* Tamás Váradi (Research Institute for Linguistics, Hungarian Academy of Sciences)

The home page of CMLC events is located at:


