[Corpora-List] CFP: Special Issue of JNLE: Finite State Methods and Models in Natural Language Processing

Jakub Piskorski jakub.piskorski at frontex.europa.eu
Sat May 8 00:44:04 CEST 2010


Natural Language Engineering SPECIAL ISSUE ON Finite State Methods and Models in Natural Language Processing

Special Issue Description --------------------------------

The languages described by regular expressions are exactly those recognized by automata that have a finite number of states (Kleene 1956). This fundamental result has been extended to string transductions, sets of trees, formal power series, grammars, semigroups, and finite models. Furthermore, the interest in finite equivalence relations in these systems has led to the study of further formal properties of recognizable languages. The study was pioneered by Schützenberger, McNaughton, Papert and Kamp, who established the equivalence of star-free generalized regular expressions, counter-free automata, first-order logic and temporal logic. Kleene's theorem, its extensions and its restrictions have tremendous methodological relevance to speech and natural language processing (NLP). In 1972, C. Douglas Johnson pioneered the study of formal aspects of phonological descriptions by reducing phonological grammars to finite state systems. This was the dawn for the currently existing finite state approached to phonology and morphology with wide-spread applications to written languages. In addition, many other NLP areas admit that they employ finite state systems or their extensions. In particular, we mention decision trees, deterministic parsing, chunk parsing, edit distance, rational kernels, hidden Markov models, Viterbi algorithms, semiring parsing, and weighted tree automata. In sum, finite state methods, both statistical and knowledge-based, are now being used in various areas of linguistic computation, ranging from speech and character recognition to message understanding and statistical machine translation. Finite state methods in NLP continue to be an area of further research and growing area. The current special issue has two goals. The first is to summarize the state of the art in the finite state methods and models of the NLP community. The second is to increase awareness of the community about open issues and new perspectives: the study of the adequacy of finite-state methods and models extends from the traditional richness and use to new generalizations and ambitious tasks. The call for papers is open to everyone: New contributions are warmly encouraged; in addition, the issue is ideal *also* for extended and updated versions of the papers presented in workshops on Finite State Methods and Natural Language Processing (FSMNLP Helsinki 2005, Potsdam 2007, Ispra 2008, and Pretoria 2009). Submissions of completely new papers under the special theme are welcomed as equally appropriate as extended and updated papers.

Topics of Interest ------------------ 1. New or updated work on the traditional topics of FSMNLP workshops The traditional topics in the series of FSMNLP workshops are appropriate. The submitted paper must clearly deal with finite states. An extended version of a meeting paper can be submitted, provided that the contribution has been updated as appropriate given the progress in the field. The forums for any preliminary versions of the paper must be indicated.

2. Study of new questions raised by fundamental results in finite state phonology and morphology In linguistics, morphological grammars describe the structure of word forms in a language, and phonological grammars describe the systematic use of sounds as units that encode the word forms. Finite state phonology and morphology consider grammars and rules whose formal semantics can be defined using finite state transducers. The fundamental result in these fields states that one can construct finite state transducers aka lexical transducers from phonological (Johnson 1972, Kaplan and Kay 1994) and morphological grammars (Koskenniemi 1983, Karttunen 1994, Beesley and Karttunen 2003). This result is not the end of inquiries but it paves the way for the study of many interesting problems, including the following: - adaptations of the fundamental result to less rigid formalisms and

descriptive approaches used in field linguistics - correlation between availability of computational morphology and

language development of under-resourced languages - approaches to the adaptation of lexical transducers to a cluster of

languages - seamless construction of computational lexicons from dynamically

changing linguistic descriptions - model-checking and automatic verification of phonological grammars - finite state approaches to language variation and diachronic

description - portability and long-term archiving of resources in computational

morphology and morpho-syntax - constructing refreshed lexical transducers quickly from updated

extended regular expressions - lexical transducers for tonal languages from auto-segmental accounts

of spreading and alignment of articulatory features - learning and training finite state grammars from small samples - designs of weighted phonological and morphological grammars - feasible finite state restrictions of optimality-theoretic and

multi-tiered phonology - efficient finite state re-implementation of competitively efficient

ad hoc methods (see the article by S. Wintner in NLE 14(4) 2007).

3. Study of new methods with connections between languages, trees and finite state automata The theory of classical string automata has a natural extension to tree automata, which found applications in NLP. In 1982, Joshi and Levy pointed out in Computational Linguistics 8(1) that phrase structure grammars actually generalize to tree automata that bring more descriptive power. The link between phrase-structure grammars, tree grammars and tree automata has enriched the study of enumerative grammars that generate trees and has given birth to model-theoretic grammars that instead describe trees. The study includes but is not restricted to: - representations of languages by tree automata and tree grammars - non-projectivity and partial commutativity of yields of tree

automata and grammars - hierarchies of tree automata and tree grammars - weighted extensions of representations of languages and

transductions - tree logics and model-theoretic syntax - language generation through tree logics - efficient and dynamic construction of automata from updated rules or

logical formulas - representations of trees and automata-based query languages for tree

banks - learning, training and minimization of representations of languages

and transductions - semiring parsing and other generalizations of classical parsing

algorithms - applications and case studies of methods based on the theory of tree

automata. In 1963, Chomsky and Schützenberger gave a morphic representation for the context-free languages i.e. the yields of local tree automata. The representation involves bracketed strings that encode the structure of the parse trees. Such use of bracketed strings and local trees gives rise to methods that decompose tree automata into simpler components many of which have regular yields. The relevant topics include: - representation of tree grammars and tree automata through

decompositions into constraints - Chomsky-Schützenberger representations for tree grammars and

weighted tree grammars - feasible constituent, dependency and alignment bracketing schemes

for grammars, tree banks and parallel corpora - intersection grammars and conjunctive grammars with recognizable

sets of string or trees - chart parsing, restarting automata, nested word automata and visibly

pushdown languages - regular bracketed string languages that approximate sets of trees.

4. Study of advantages of restrictions defined in algebraic theories of automata and languages or in finite model theory Feasible finite state methods in natural language processing are an application of a vast body of basic research in mathematics and computer science. There remain, however, situations where straightforward and general methods fail to be applicable or efficient. It is in these situations where NLP applications motivate interest in special families of regular relations, automata, semirings, and formal power series. Connections between such families and NLP tasks have not yet been fully elaborated. If we are lucky, the study of the connections will lead to new fundamental results that give rise to freshly identified subfamilies of finite state methodologies in NLP. The linguistically relevant restrictions on the power of general-purpose finite state methods might include: - finite ambiguity - sequential functions - step functions - idempotent semirings - locally finite semirings - descriptive complexity - deterministic transition closure - first-order definability - aperiodic, counter-free and star-free sets - unambiguous concatenation - partial commutativity - discounted weights - finite synchronization delay - limited center-embedding - local testability - finite bandwidth. In addition, the topics of interest include tools that support experiments with these restrictions and their specialized algorithms: - new kinds of implementations of finite state compilers, libraries

and on-demand operations - optimized algorithms for manipulation of automata or related

formalisms - benchmarks suitable for evaluation of performance of algorithms - methods that construct, minimize or decompose automata or


Important Dates --------------- - Deadline for submissions: 23 May 2010 - First decision: 10 July 2010 - Submission of revised version: 15 August 2010 - Final decision: 22 September 2010 - Submission of camera-ready versions: 23 October 2010 Submission ---------- Articles submitted to this special issue must adhere to the NLE instructions for contributors. We encourage authors to keep their submissions below 30 pages. Up to date information will be available at http://www.ling.helsinki.fi/projects/jnle/.

Editorial Board -------------------

Guest Editors - Anssi Yli-Jyrä (Department of Modern Languages, University of

Helsinki, Finland)

firstname.yli-jyra at helsinki.fi<mailto:firstname.yli-jyra at helsinki.fi> - András Kornai (Budapest Institute of Technology, Hungary and

MetaCarta, Cambridge, USA) - Jacques Sakarovitch (CNRS and Ecole Nationale Supérieure des

Télécommunications, Paris, France)

Guest Editorial Board ---------------------------

- Julie Berndsen (School of Computer Science & Informatics, University

College Dublin, Ireland) - Francisco Casacuberta (Instituto Tecnologico De Informática,

Valencia, Spain) - Jean-Marc Champarnaud (Université de Rouen, France) - Jan Daciuk (Gdansk University of Technology, Poland) - Manfred Droste (Institut für Informatik, Universität Leipzig,

Germany) - Dafydd Gibbon (Fakultät für Linguistik und Literaturwissenschaft,

University of Bielefeld, Germany) - Colin de la Higuera (University of Nantes, France) - Lauri Karttunen (Palo Alto Research Center and Department of

Linguistics, Stanford University, USA) - André Kempe (CADEGE Technologies & Consulting, France) - Kevin Knight (Computer Science Department, University of Southern

California) - Hans-Ulrich Krieger (DFKI GmbH, Saarbrücken, Germany) - Marco Kuhlmann (Department of Linguistics and Philology, Uppsala

University, Sweden) - Andreas Maletti (Universitat Rovira i Virgili, Spain) - Stoyan Mihov (Bulgarian Academy of Sciences, Sofia, Bulgaria) - Mark-Jan Nederhof (School of Computer Science, University of St

Andrews, Scotland) - Kemal Oflazer (Sabanci University, Turkey) - Jakub Piskorski (Polish Academy of Sciences, Warsaw, Poland) - Michael Riley (Google Research, New York, USA) - Strahil Ristov (Ruder Boskovic Institute, Zagreb, Croatia) - Max Silberztein (Université de Franche-Comté, France) - Bruce Watson (Dept. of Computer Science, University of Pretoria,

South Africa) - Menno van Zaanen (Department of Communication and Information

Sciences, Tilburg University, the Netherlands)

Contact ------- Anssi Yli-Jyrä Department of Modern Languages University of Helsinki firstname (dot) lastname-with-dash (at) helsinki (dot) fi

-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 14802 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20100508/482f9aa7/attachment.txt>

More information about the Corpora mailing list