[Corpora-List] PhD position on Multilingual Semantic Processing at LIMSI CNRS

Marianna Apidianaki marianna at limsi.fr
Tue Jan 3 20:55:15 CET 2017

== PhD position in the MultiSem project at the LIMSI CNRS lab ==

We invite applications for a PhD position in the ANR project MultiSem: Advanced Models for Multilingual Semantic Processing

The MultiSem project aims to propose novel advanced models for multilingual semantic processing capable of adapting processing to the disambiguaton needs of specific lexical items, contexts and textual genres. To achieve this goal, the proposed models will combine robust machine learning methods, such as topic models and continuous space representations (Blei et al., 2003; Vulić et al., 2015; Melamud et al., 2015; Labeau et al., 2015), with traditional vector-space and knowledge-based approaches to ambiguity resolution (Erk and Padó, 2008; Kremer et al., 2014; Apidianaki, 2016). The selection of the optimal representation and approach for specific lexical items and text types will be guided by the output of ambiguity type detection (McCarthy et al., 2016) and genre identification mechanisms that will be developed during the project.

The successful candidate will work on developing dynamic multi-layer models combining high-level and fine-grained disambiguation techniques. Document-level domain and topic-related information will serve to confine selection to semantic interpretations of words valid in the processed texts and will be complemented, when necessary, with finer-grained semantic analysis performed by vector-space models and neural network representations. More specifically, the PhD student will work on the following topics:

- development of multilingual topic models exploiting language-specific and joint multilingual representations - development of neural network representations and distributional vector-based models for fine-grained ambiguity resolution - modelling the interaction between high-level and fine-grained disambiguation models - design of a unified framework joining the models and permitting to move between different disambiguation layers


* Master’s degree in Computer Science or Natural Language Processing * Fluent English is compulsory. Working knowledge of French will be a plus but is not required. * Solid programming skills

The successful candidate should have background in Natural Language Processing with a solid knowledge of statistics, computer programming and machine learning. Experience in semantics and/or multilingual NLP will be highly appreciated.

The PhD will be co-supervised by Marianna Apidianaki and Alexandre Allauzen in the LIMSI-CNRS lab. Applications including: * a cover letter * a curriculum vitae, including a list of publications if applicable * a copy of the last degree and study transcript * names and contact information of at least two referees

should be sent to marianna at limsi.fr <mailto:marianna at limsi.fr> and allauzen at limsi.fr <mailto:allauzen at limsi.fr>


Application deadline: open until filled Starting date: March or September 2017 Duration: 3 years (possibility of 6-month extension) Salary: 31308€ brut / 16932€ net annual salary (health insurance included)


LIMSI is a laboratory of the French National Research Center (CNRS), a leading research institution in Europe. It is a strongly multi-disciplinary laboratory which hosts researchers from Engineering and Computer Science, Life and Social Sciences. Its scientific field covers Natural Language Processing, Human-Machine Interaction, Augmented and Virtual Reality, Fluid Mechanics, and Energetics. LIMSI is also associated with two universities: the University of Paris-Sud and the University of Paris-Saclay, which groups several computer science labs and institutions situated on the Paris-Saclay campus. The LIMSI lab is located in a green area about 30 minutes south of Paris. For more information, see https://www.limsi.fr/en/ <https://www.limsi.fr/en/>


- Marianna Apidianaki (2016) Vector-space Models for PPDB Paraphrase Ranking in Context. Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP-2016), Austin, Texas. - David M. Blei, Andrew Y. Ng, Michael I. Jordan, and John Lafferty (2003) Latent Dirichlet Allocation. Journal of Machine Learning Research, 3:2003. - Matthieu Labeau, Kevin Löser and Alexandre Allauzen (2015) Non-lexical neural architecture for fine-grained pos tagging. Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP-2015), Lisbon, Portugal. - Diana McCarthy, Marianna Apidianaki and Katrin Erk (2016) Word Sense Clustering and Clusterability. Computational Linguistics, Vol. 42(2), pp. 245-275. - Oren Melamud, Omer Levy, Ido Dagan (2015) A Simple Word Embedding Model for Lexical Substitution. Proceedings of the 1st Workshop on Vector Space Modeling for Natural Language Processing, Denver, Colorado. - Ivan Vulić, Wim De Smet, Jie Tang and Marie-Francine Moens (2015) Probabilistic Topic Modeling in Multilingal Settings: An Overview of Its Methodology and Applications. Information Processing & Management (IP&M), Vol. 51(1), pp. 111-147.

-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 6497 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20170103/d08072d5/attachment.txt>

More information about the Corpora mailing list