[Corpora-List] Deadline extended: ACL 2010 Workshop on Domain Adaptation for NLP (DANLP)

Barbara Plank b.plank at rug.nl
Sun Apr 4 12:52:54 CEST 2010



ACL 2010 Workshop on Domain Adaptation for Natural Language Processing (DANLP 2010) http://sites.google.com/site/danlp2010/

July 15, 2010, Uppsala, Sweden



Most modern Natural Language Processing (NLP) systems are subject to the well known problem of lack of portability to new domains/genres: there is a substantial drop in their performance when tested on data from a new domain, i.e., their test data is drawn from a related but different distribution as their training data. This problem is inherent in the assumption of independent and identically distributed (i.i.d.) variables for machine learning systems, but has started to get attention only in recent years. The need for domain adaptation arises in almost all NLP tasks: part-of-speech tagging, semantic role labeling, statistical parsing and statistical machine translation, to name but a few.

Studies on supervised domain adaptation (where there are limited amounts of annotated resources in the new domain) have shown that baselines comprising of very simple models (e.g. models based only on source-domain data, only target-domain data, or the union of the two) achieve relatively high performance and are "surprisingly difficult to beat" (Daume III, 2007). Thus, one conclusion from that line of work is that as long as there is a reasonable (often even small) amount of labeled target data, it is often more fruitful to just use that.

In contrast, semi-supervised adaptation (i.e., no annotated resources in the new domain) is a much more realistic situation but is clearly also considerably more difficult. Current studies on semi-supervised approaches show very mixed results. For example, Structural Correspondence Learning (Blitzer et al., 2006) was applied successfully to classification tasks, while only modest gains could be obtained for structured output tasks like parsing. Many questions thus remain open.

The goal of this workshop is to provide a meeting-point for research that approaches the problem of adaptation from the varied perspectives of machine-learning and a variety of NLP tasks such as parsing, machine-translation, word sense disambiguation, etc. We believe there is much to gain by treating domain-adaptation as a general learning strategy that utilizes prior knowledge of a specific or a general domain in learning about a new domain; here the notion of a 'domain' could be as varied as child language versus adult-language, or the source-side re-ordering of words to target-side word-order in a statistical machine translation system.

Sharing insights, methodologies and successes across tasks will thus contribute towards a better understanding of this problem. For instance, self-training the Charniak parser alone was not effective for adaptation (it has been common wisdom that self-training is generally not effective), but self-training with a reranker was surprisingly highly effective (McClosky et al., 2006). Is this an insight into adaptation that can be used elsewhere? We believe that the key to future success will be to exploit large collections of unlabeled data in addition to labeled data. Not only because unlabeled data is easier to obtain, but existing labeled resources are often not even close to the envisioned target application domain. Directly related is the question of how to measure closeness (or differences) among domains.

=============== Workshop Topics ===============

We especially encourage submissions on semi-supervised approaches of domain adaptation with a deep analysis of models, data and results, although we do not exclude papers on supervised adaptation. In particular, we welcome submissions that address any of the following topics or other relevant issues:

* Algorithms for semi-supervised DA * Active learning for DA * Integration of expert/prior knowledge about new domains * DA in specific applications (e.g., Parsing, MT, IE, QA, IR, WSD) * Automatic domain identification and model adjustment * Porting algorithms developed for one type of problem structure to another (e.g. from binary classification to structured-prediction problems) * Analysis and negative results: in-depth analysis of results, i.e. which model parts/parameters are responsible for successful adaptation; what can we learn from negative results (impact of negative experimental results on learning strategies/ parameters) * A complementary perspective: (Better) generalization of ML models, i.e. to make NLP models more broad-coverage and domain-independent, rather than domain-specific * Learning from multiple domains

========== Submission ==========

Papers should be submitted via the ACL submission system:


All submissions are limited to 6 pages (including references) and should be formatted using the ACL 2010 style file that can be found at:


As the reviewing will be blind, papers must not include the authors' names and affiliations. Submissions should be in English and should not have been published previously. If essentially identical papers are submitted to other conferences or workshops as well, this fact must be indicated at submission time.

The extended submission deadline is 23:59 CET on April 11, 2010 (Sunday).

=============== Important Dates ===============

April 11, 2010: Submission deadline May 11, 2010: Notification of acceptance May 21, 2010: Camera-ready papers due July 15, 2010: Workshop

=============== Invited speaker ===============

John Blitzer, University of California, United States

============ Organization ============

Hal Daumé III, University of Utah, USA Tejaswini Deoskar, University of Amsterdam, The Netherlands David McClosky, Stanford University, USA Barbara Plank, University of Groningen, The Netherlands Jörg Tiedemann, Uppsala University, Sweden

================= Program Committee =================

Eneko Agirre, University of the Basque Country, Spain John Blitzer, University of California, United States Walter Daelemans, University of Antwerp, Belgium Mark Dredze, Johns Hopkins University, United States Kevin Duh, NTT Communication Science Laboratories, Japan (formerly University of Washington, Seattle) Philipp Koehn, University of Edinburgh, United Kingdom Jing Jiang, Singapore Management University, Singapore Oier Lopez de Lacalle, University of the Basque Country, Spain Robert Malouf, San Diego State University, United States Ray Mooney, University Texas, United States Hwee Tou Ng, National University of Singapore, Singapore Khalil Sima'an, University of Amsterdam, The Netherlands Michel Simard, National Research Council of Canada, Canada Jun'ichi Tsujii, University of Tokyo, Japan Antal van den Bosch, Tilburg University, The Netherlands Josef van Genabith, Dublin City University, Ireland Yi Zhang, German Research Centre for Artificial Intelligence (DFKI GmbH) and Saarland University, Germany

======= Sponsor =======

This workshop is kindly supported by the Stevin project PaCo-MT (Parse and Corpus-based Machine Translation) .

======= Contact =======

Email: danlp.acl2010 at gmail.com Website: http://sites.google.com/site/danlp2010/

More information about the Corpora mailing list