[Corpora-List] Fully funded PhD position in Computational Linguistics, Paris, France

Marie Candito marie.candito at gmail.com
Thu Mar 31 18:04:51 CEST 2022

( https://selexini.lis-lab.fr/jobs/2022/03/31/phd-position )

### PhD position (Paris, France) - SELEXINI <https://selexini.lis-lab.fr/> project

### Semi-supervised word sense and frame induction


Contract duration: 36 months


Starting date: October 2022 to December 2022


Location: LLF laboratory, computational linguistics axis

<http://www.llf.cnrs.fr/en/research-topics>, Paris, France


Advisors: Marie Candito (LLF laboratory

<http://www.llf.cnrs.fr/en/research-topics>) and Carlos Ramisch (LIS

lab, TALEP team <https://talep.lis-lab.fr/>)


Net salary : 1750 € (including 64h teaching, optional)


Application: The application file should be sent by May 9 to

marie.candito at u-paris.fr and carlos.ramisch at lis-lab.fr. It should



a CV (max 5 pages) with transcripts (Master), diplomas, internships


a cover letter


the names and contact of two referees

The candidates selected for interviews will send their Master thesis or other written work supporting their qualification for the project. They will be interviewed (remotely) between the end of May and mid-June 2022.

SELEXINI is a research project funded by the French National Research Agency (ANR) that focuses on semi-supervised word sense induction and semantic frame induction. The starting observation for this project is that identifying word meanings in context can lead to better performance and interpretability of NLP system predictions, but that the lack of large coverage sense-annotated data (coverage in terms of domains and of languages) hinders the use of lexicons in modern neural NLP.

The project aims at developing a word sense induction method by clustering occurrences, thus providing by construction a sense-annotated corpus, admittedly noisy but with large coverage. The method will be guided by pre-existing lexicons (in particular Wiktionary, available for many languages), and will make the best use of pre-trained transformer-based language models. The project also includes a part on the generation of definitions of these induced senses, as well as their use in a neural machine reading comprehension system, in order to improve its performance and the interpretability of its decisions.

The topic of this PhD position is more specifically the semi-supervised sense and frame induction part, using Wiktionary senses as constrained clustering seeds, and the grouping and structuring of induced senses into "semantic frames". The latter involves grouping occurrences of predicative lemmas, based on similarities of their argument structures observed in corpora, and grouping their semantic arguments into induced semantic roles.


Ustalov, D., Panchenko, A., Kutuzov, A., Biemann, C. and Ponzetto, S.

P., 2018, Unsupervised semantic frame induction using triclustering

<https://aclanthology.org/P18-2010/>. In ACL 2018.


Yamada K., Sasano R., Takeda K., 2021, Semantic Frame Induction using

Masked Word Embeddings and Two-Step Clustering

<https://aclanthology.org/2021.acl-short.102/>. In ACL 2021.


Zhang H., Basu S., Davidson I., 2020, A Framework for Deep Constrained

Clustering - Algorithms and Advances

<https://ecmlpkdd2019.org/downloads/paper/62.pdf>. In Machine Learning

and Knowledge Discovery in Databases. ECML PKDD 2019. LNCS (11906). -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 19847 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20220331/ddf6aff9/attachment.txt>

More information about the Corpora mailing list