Natural language processing for large literary corpora

12-month post-doctoral position, at Lattice (Montrouge & Paris), funded by PRAIRIE (Paris Artificial Intelligence Research Institute)

https://tinyurl.com/stqtocq <https://tinyurl.com/stqtocq>

Thierry Poibeau, thierry.poibeau at ens.fr https://prairie-institute.fr/chairs/poibeau-thierry/ <https://prairie-institute.fr/chairs/poibeau-thierry/> https://www.lattice.cnrs.fr/en/members/direction/thierry-poibeau/ <https://www.lattice.cnrs.fr/en/members/direction/thierry-poibeau/>

This position is about the use of advanced Natural Language Processing techniques for the analysis of large literary corpora.

We now have at our disposal large literary corpora (several hundred, even thousands of novels) and a wide collection of robust and efficient natural language processing (NLP) tools, for English, but also for languages like French. It is possible, for example, to syntactically parse the entire work of an author or a group of authors in a few minutes, with reasonable quality.

The use of advanced NLP techniques for the analysis of literary corpora has given birth to original studies, whether for modelling suspense, the personality of characters or their interaction network. However very few studies have been made on French literary corpora so far, despite the availability of efficient tools and corpora.

The post-doctoral candidate will develop advanced NLP techniques for the analysis of large literary corpora, novels or other kinds of literary texts (an example could be the study of what is called in French the ‘’roman populaire’’, which is a vague category, poorly defined in terms of literary features). Possible tasks may include:

— the identification of specific narrative patterns and their distribution in the text (textual topology), — the identification of narrative breaks and recurrences based, for example, on the application of latent semantic segmentation techniques, — the identification of (diachronic) subgenres, — the identification of subjective patterns thanks to sentiment analysis techniques

The expected result will be the definition of a ''literary archetype'' and its confrontation with precious literary studies. The position requires a certain familiarity with recent machine learning techniques used in NLP and preferably a good command of French. A key question is how to turn the results of advanced NLP tools into something interesting for literary studies (for example, how can we exploit syntactic annotations for stylistic studies?).

Skills required

— A goof knowledge of recent ML and NLP techniques — Advanced programming skills (preferably in Python) — A good command of French — Interest in French literature and digital humanities

Previous relevant publications is of course a plus.

Practical details

12-month post-doctoral position (an extension is possible). Work place will be Lattice (École normale supérieure, 1 rue Maurice Arnoux, 92120 Montrouge, 5mn from the nearest metro station, in the Paris area). Salary will be fixed according to the candidate's profile and experience, following the Prairie salary scale.

How to apply?

Send a detailed CV, cover letter and other relevant documents;ents (one or two recent publications, recommended;;rendition letters, etc.) by e-mail to Thierry Poibeau. The position is open until filled, start date can be as soon as 1st of April, or later on depending on the candidate.

