[Corpora-List] Call for contributions: NIPS 2006 Workshop on MACHINE LEARNING FOR MULTILINGUAL INFORMATION ACCESS

George Foster foster at iro.umontreal.ca
Mon Oct 2 22:29:01 CEST 2006

Call for contributions

NIPS 2006 Workshop



In many different settings, accessing information available in different
languages is a challenge.

In Europe, the wide variety of languages is clearly a bottleneck for
efficient circulation and access to information. More than half of EU
citizens cannot hold a conversation in a language other than their
mother tongue. Even in an officially bilingual country like Canada, less
than one in five are considered to have a good enough command of both
official languages (2001 census data).

The traditional paradigm for addressing this issue is to perform human
translation on a massive scale, and rely on monolingual information
access technology. Although this model has worked reasonably well in the
past, the rapid increase in the amount of information produced (and, in
Europe, in the number of languages covered) raises questions as to its
sustainability. Machine Learning has the potential to help develop and
deploy technology that provides:

1. access to information across different languages,
2. usable translation from one language to another.

We are interested in Machine Learning techniques addressing for example
the following problems:

* Word alignment
* Machine translation
* Multilingual lexicon and terminology extraction
* Cross-lingual information retrieval
* Cross-lingual categorisation

Goals of the workshop:

Multilingual applications are also emerging as a promising application
for some Machine Learning techniques, for example the use of Kernel CCA
for Cross-Language applications, or large-margin approaches to word
alignment. This new trend converges with a well-established interest of
the Natural Language Processing community for learning approaches.

The purpose of this workshop is to provide a forum for discussion of
current developments at the intersection between multilingual processing
and machine learning. This includes developing new techniques to address
various multilingual information access problems (e.g. translation), but
also scaling up existing techniques to the available NLP data,
developing tools for cross-language information retrieval, etc.

We will promote discussions of some inter-related key issues in applying
Machine Learning to Multilingual problems:

- Applying ML to 100 million words corpora (e.g. SMT)
- Deploying ML solutions on new language pairs

- Languages or domains with limited bilingual corpora
- Bootstrapping limited resources

- Design of better performance measures
- Optimisation of application-specific measures
- Learning human evaluation

- Modelling and using linguistic knowledge in ML
- The continuum between all-data (SMT) and all prior knowledge
(handcrafted rules)

Submission instructions:

Researchers interested in presenting their work at the workshop should
send an email to: mlia at nrc-cnrc.gc.ca
(preferably plain text) with the following information:

- Title
- Author(s)
- Abstract (around 1 page)

Submission deadline: 29 October 2006
Notification: 6 November 2006
Workshop date: 8 or 9 December 2006

Cyril Goutte, National Research Council Canada (contact)
Nicola Cancedda, Xerox Research Centre Europe
Marc Dymetman, Xerox Research Centre Europe
George Foster, National Research Council Canada

Workshop format:
We intend to leave a good part of the workshop to panel discussions that
will address relevant topics in multilingual information access (MIA),
as well as invited talks presenting some important MIA problems and
associated challenges for Machine Learning. For each half day, we will
start with either a keynote or a short tutorial, continue with a few
shorter technical presentations, and end with a panel discussion (topics
to be decided depending on the confirmed list of speakers).

Invited speakers:

- Dan Melamed (Courant Institute, NYU)
- John Shawe-Taylor (ECS, U. of Southampton, UK), tbc
- Ralf Steinberger (JRC, Ispra, Italy)
- Wray Buntine (HIIT, Helsinki, Finland), tbc

Related work:
Past NIPS workshops have addressed related topics such as learning with
structured data, or the use of Machine Learning for Natural Language
Processing. There is also some ongoing interest within the European
network of excellence Pascal, as exemplified by the recent workshop on
intelligent information access. However none of these specifically
target multilingual aspects. We believe there is sufficient interest and
genuine need on this particular aspect to justify a specific focus on
multilingual information access. The newly started European project
SMART (Statistical Multilingual Analysis for Retrieval and Translation)
is specifically targeting advanced machine learning techniques for
multilingual applications.

More information about the Corpora-archive mailing list