DESCRIPTION OF THE THESIS TOPIC Neural embeddings, used to construct semantic representations for words as well as sentences, have become ubiquitous in the field of natural language processing. Although their use leads to very good performance in a wide range of tasks, they present difficulties related to their opacity. In particular, much uncertainty remains about their ability to capture linguistic knowledge. The goal of this PhD studentship will be to investigate the ability of neural embeddings to acquire specific linguistic phenomena.
The first of these phenomena concerns the selection preferences of verbal predicates. Most verbs have a preference for arguments that belong to particular semantic classes. Theoretically, sentence embeddings implicitly model these preferences, as well as the semantic features of the predicate and its arguments. On the other hand, it remains unclear whether they are able to correctly model more irregular cases, where the verb selects for a semantically atypical argument. The thesis will include a thorough study of the interaction of the verb with its arguments during the construction of sentence embeddings: how does the representation of the verb contextualize itself in the presence of its arguments, and vice versa?
Related to the first phenomenon, we will study the extent to which phrase embeddings are able to capture subcategorization frames. In a given context, a verb selects a subcategorization framework that depends on its syntactic and semantic properties, as well as those of the context. Again, sentence embeddings theoretically capture this information, but are they also able to identify irregular arguments? The study will examine the behavior of sentence embeddings in contexts where the arguments either do or do not conform to the verb subcategorization frame.
WORK CONTEXT The candidate will jointly carry out their research at the CLLE laboratory (CARTEL team) and the IRIT laboratory (Melodi team) at the University of Toulouse. CLLE develops NLP methods and tools for linguistics, while IRIT is a computer laboratory internationally recognized for its research on artificial intelligence and natural language processing; IRIT is also one of the founding laboratories of ANITI (3IA), the artificial intelligence institute of Toulouse.
FUNDING The thesis will be funded for a period of 3 years. The monthly remuneration is 2135 euros gross (1715 euros net).
PROFILE ● Master's degree in NLP or computer science; ● strong knowledge of NLP and machine learning; ● strong programming skills (Python); ● good command of English and French; ● good writing and oral presentation skills.
APPLICATION Applications must be filed at:
INQUIRIES Feel free to contact Nabil Hathout (nabil.hathout at univ-tlse2.fr) or Tim Van de Cruys (tim.vandecruys at irit.fr) to have more information.
ADDITIONAL INFORMATION Keywords: neural embeddings; compositionality; contextual models; subcategorization frames. The PHD will be co-directed by Nabil Hathout (CNRS, CLLE) and Tim Van de Cruys (CNRS, IRIT).
-- CLLE/ERSS, CNRS & Université de Toulouse Jean Jaurès Maison de la Recherche. F-31058 Toulouse cedex 9 Tél. (+33) 561-503-603. Nabil.Hathout at univ-tlse2.fr http://w3.erss.univ-tlse2.fr/membre/hathout/