Topic: Conditional Random Fields for structured annotation

In the field of natural language processing, statistical methods have become very popular thanks to their efficiency in dealing with noisy inputs (e.g., automatic speech transcripts, blog, tweets). In the particular case of sequence labeling problems, such as named entities recognition (discovery of person names, places, dates, etc. within a text), the Conditional Random Fields (CRF) model is particularly effective .

CRFs are currently limited to model unstructured sequences (flat representation), while one of the current problems is to predict structured sequences (tree-structured for example). Methods for modeling such structured sequences exist, like Probabilistic Context Free Grammars (PCFG), but their effectiveness is limited when confronted with noisy input as we are aiming to process.

The proposed work is to study a method for predicting sequences of structured objects robust to handle noisy inputs. This method will provide a decomposition of the overall problem into sub-problems modeled by CRF and redial the different results to produce a structured output. Evaluation will be performed on noisy texts (speech transcripts, tweets...).

This work takes place in the context of the Quaero project, funded by the French National Innovation Agency (www.quaero.org). The work will be performed at IRISA/INRIA Rennes, France (http://www.irisa.fr/, http://www.inria.fr/centre/rennes/). The candidate will integrate the TexMex team, whose main research topics include large-scale multimedia indexing, speech processing, information retrieval.


The successful candidate will have a PhD with a track record of Text-Mining or Machine Learning for Natural Language Processing research. Fluency in English is mandatory.

This position is to be filled as early as possible, and will end on the 31st of Dec 2013. Salary follows INRIA scales and depends on the candidate's experience (the minimum monthly net salary is about 1900 €).

To apply, please send a cover letter, describing how the applicant's knowledge and research background will contribute to the project, a CV, and the names and contact information of two referees to:

Christian Raymond (christian.raymond at irisa.fr)

