### PhD position (Paris, France) - SELEXINI <https://selexini.lis-lab.fr/> project
### Semi-supervised word sense and frame induction
-
Contract duration: 36 months
-
Starting date: October 2022 to December 2022
-
Location: LLF laboratory, computational linguistics axis
<http://www.llf.cnrs.fr/en/research-topics>, Paris, France
-
Advisors: Marie Candito (LLF laboratory
<http://www.llf.cnrs.fr/en/research-topics>) and Carlos Ramisch (LIS
lab, TALEP team <https://talep.lis-lab.fr/>)
-
Net salary : 1750 € (including 64h teaching, optional)
-
Application: The application file should be sent by May 9 to
marie.candito at u-paris.fr and carlos.ramisch at lis-lab.fr. It should
comprise:
-
a CV (max 5 pages) with transcripts (Master), diplomas, internships
-
a cover letter
-
the names and contact of two referees
The candidates selected for interviews will send their Master thesis or other written work supporting their qualification for the project. They will be interviewed (remotely) between the end of May and mid-June 2022.
SELEXINI is a research project funded by the French National Research Agency (ANR) that focuses on semi-supervised word sense induction and semantic frame induction. The starting observation for this project is that identifying word meanings in context can lead to better performance and interpretability of NLP system predictions, but that the lack of large coverage sense-annotated data (coverage in terms of domains and of languages) hinders the use of lexicons in modern neural NLP.
The project aims at developing a word sense induction method by clustering occurrences, thus providing by construction a sense-annotated corpus, admittedly noisy but with large coverage. The method will be guided by pre-existing lexicons (in particular Wiktionary, available for many languages), and will make the best use of pre-trained transformer-based language models. The project also includes a part on the generation of definitions of these induced senses, as well as their use in a neural machine reading comprehension system, in order to improve its performance and the interpretability of its decisions.
The topic of this PhD position is more specifically the semi-supervised sense and frame induction part, using Wiktionary senses as constrained clustering seeds, and the grouping and structuring of induced senses into "semantic frames". The latter involves grouping occurrences of predicative lemmas, based on similarities of their argument structures observed in corpora, and grouping their semantic arguments into induced semantic roles.
-
Ustalov, D., Panchenko, A., Kutuzov, A., Biemann, C. and Ponzetto, S.
P., 2018, Unsupervised semantic frame induction using triclustering
<https://aclanthology.org/P18-2010/>. In ACL 2018.
-
Yamada K., Sasano R., Takeda K., 2021, Semantic Frame Induction using
Masked Word Embeddings and Two-Step Clustering
<https://aclanthology.org/2021.acl-short.102/>. In ACL 2021.
-
Zhang H., Basu S., Davidson I., 2020, A Framework for Deep Constrained
Clustering - Algorithms and Advances
<https://ecmlpkdd2019.org/downloads/paper/62.pdf>. In Machine Learning
and Knowledge Discovery in Databases. ECML PKDD 2019. LNCS (11906). -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 19847 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20220331/ddf6aff9/attachment.txt>