[Corpora-List] CFP: Computationally Hard Problems in Speech and Language Processing

Hal Daume III hdaume at ISI.EDU
Mon Dec 12 22:47:00 CET 2005


WEBSITE: http://www.cis.upenn.edu/~ryantm/naaclWS06/

Call for Papers

Modern machine learning methods (such as maximum entropy models,
decision trees and support vector machines) have provided the language
processing community with robust tools for supervised learning
problems with simple outputs. These methods provide natural
mechanisms for representing linguistic knowledge through weighted
features and obtain high accuracy by maximizing performance on
training data.

Unfortunately, when one wishes to apply similarly robust techniques to
problems with complex outputs, one is limited to models with nice
computational properties, such as those that can be formulated in
terms of sequence or tree representations. Even for these relatively
simple structures, polynomial time algorithms exist only under strong
restrictions on the locality of features (the Markov assumption for
sequences and the context free assumption for trees). Moreover, even
under such assumptions, the complexity of these algorithms can grow
unwieldy even while remaining polynomial, for instance in the case of
synchronous or lexicalized context free grammars.

Recent work on ranking, sampling and other approximate solutions to
such problems have indicated that researchers are coming back to the
hard problems in speech and text, for which efficient algorithms do
not (or are not known to) exist. Some might even argue that language
processing has more to gain from richer and more global feature
representations than it does from new machine learning solutions.

The purpose of this workshop will be to explore new research aimed at
identifying and solving speech and language processing problems that do
not have practical computational properties. We also wish to explore
the adaptation of current state-of-the-art inference and learning
algorithms to problems beyond sequence and tree analysis. In
particular the workshop will have the following themes:


- Formal investigations on the computational properties of new and old
problems in speech, parsing, translation, summarization, information
extraction and other common language processing areas.

- Moving beyond the Markov and context-free assumptions of our
underlying representations. Identifying the linguistic
(im)plausibility of these assumptions. Global feature functions
that incorporate much richer information about sentence and document
level properties as well as long distance dependencies.

- Joint representations such as those incorporating both syntax
(phrase-structure, dependencies, etc.) and semantics (entities,
relations, predicate-arguments).

- Efficient solutions to problems that have historically been viewed
as intractable.

- Appropriate (natural) representations of problems, and how this
effects complexity/performance. Structure of search.

- Identifying when old solutions (e.g. classification, sequential
labeling) to new problems are appropriate and when more flexible
solutions are required.


- Approximate inference algorithms. The pros and cons of pruning
techniques. New approximation algorithms for hard problems
including those based on beam search, sampling or other motivated
methods. Surveys of known approximation and pruning methods
identifying specific properties of success and failure. Efficiency
vs. performance trade-offs. Formal characterizations of search
errors, and their relation to complexity.

- Approximate parameter estimation. Training techniques for models in
which parameter estimation is intractable for both generative and
discriminative frameworks. The effects of approximate parameter
estimation on performance. Learning parameters with respect to
approximation in search. Batch vs. online learning techniques when
combined with approximate inference.

- Joint structures and complex systems. When are pipelined methods
sufficient? When do we need joint learning and inference to achieve
maximum benefit? Are the benefits of joint inference and learning
for factorizable structures worth the computational cost?

- The trade off between loss functions and tractability. When is it
necessary to use structure, or can components be predicted
separately? Theoretical or practical comparisons between
computational complexity and performance for different loss

- Differences between complex one-pass systems versus
stacked/multi-pass "decoding."

We encourage the submission of papers that address any of the above

This workshop is interested in completed work as well as works in

Submissions should be a maximum of 8 pages and should use the
HLT-NAACL formatting guidelines that can be obtained here.
Furthermore, the reviewing process will be blind so names and
affiliations need to be removed from the title page as well as
all self-references that reveal the author's identity.

Submissions should be sent to ryantm at cis.upenn.edu and should
have "HLT-NAACL Workshop Submission" in the title.


- March 3, 2006: Paper submissions due by 11:59pm EST
- April 7, 2006: Paper acceptance notification
- April 21, 2006: Camera ready versions due
- June 9, 2006: Workshop

Workshop Website: http://www.cis.upenn.edu/~ryantm/naaclWS06/

Organizers and PC


* Ryan McDonald (UPenn) - contact ryantm at cis.upenn.edu
* Hal Daumé III (ISI)
* Fernando Pereira (UPenn)

Program Committee

* Jeff Bilmes (Washington)
* Michael Collins (MIT)
* Jason Eisner (Johns Hopkins)
* Radu Florian (IBM)
* Liang Huang (UPenn)
* Thorsten Joachims (Cornell)
* Chris Quirk (Microsoft)
* Dan Roth (UIUC)
* Noah Smith (Johns Hopkins)
* Charles Sutton (UMass)
* Ben Taskar (Berkeley)

More information about the Corpora-archive mailing list