[Corpora-List] Open PhD position (4 years) on Generalizability of NLP experiments at the Vrije Universiteit Amsterdam

Antske Fokkens antske.fokkens at vu.nl
Fri Aug 14 16:03:13 CEST 2020

We (the CLTL lab at the Vrije Universiteit in Amsterdam) have a 4-year PhD position in computational linguistics which aims to investigate to what extent the outcome of NLP experiments generalize to real life use cases. This will be investigated in the context of using NLP models for detecting online verbal aggression for research in the social sciences.

Are you interested in fundamental methodological questions of NLP and in investigating NLP models for detecting online verbal aggression? Then please apply by *August 31st *through the official website: https://workingat.vu.nl/ad/phd-hybrid-intelligence-generalizing-nlp-experiments/kvsnvm

*Full Advertisement*

Why build AI systems that replace people if we can build AI systems that collaborate with people? Hybrid Intelligence is the combination of human and machine intelligence, expanding human intellect instead of replacing it. Our goal is to design Hybrid Intelligent systems, an approach to Artificial Intelligence that puts humans at the centre, changing the course of the ongoing AI revolution.

The project will be recruiting 15 PhD or postdoc positions in total. For more information on the project see www.hybrid-intelligence-centre.nl

At the Vrije Universiteit Amsterdam we are looking for a PhD candidate in computational linguistics for the project Predicting generalizability of NLP experiments <https://workingat.vu.nl/ad/phd-hybrid-intelligence-generalizing-nlp-experiments/kvsnvm>

*Location: *AMSTERDAM *FTE: *0.8 - 1

*Project Description*

Natural language processing has a strong tradition in experimental research, where various methods are evaluated on gold standard datasets. Though these experiments can be valuable to determine which methods work best, they do not necessarily provide sufficient insight into the general quality of our methods for real-life applications. There are two questions that often need to be addressed before knowing whether a method is suitable to be used in a real-life application in addition to the outcome of a typical NLP experiment. First, what kind of errors does the method make and how problematic are they for the application? Second, how predictive are results obtained on the benchmark sets for the data that will be used in the real-life application? This project aims to address these two questions combining advanced systematic error analyses and formal comparison of textual data and language models.

Though potential erroneous correlations were still relatively easily identified in scenarios of old-fashioned extensive feature engineering and methods such as K-nearest neighbors, Naive Bayes, logistic regression, SVM, this has become more challenging now that technologies predominantly make use of neural networks. The field has become increasingly interested in exploring ways to interpret neural networks, but, once again, many studies focus on field internal questions (what linguistic information is captured? Which architectures learn compositionality to what extent?). We aim to take this research a step further and see if we can use insights into the workings of deep models to predict how they will work for specific applications that make use of data different from the original evaluation data. Both error analysis and formal comparison methods will contribute to establishing the relation between generic language models, task specific training data, evaluation data and ”new data”. By gaining a more profound understanding of these relations, we try and define metrics that can be used to estimate or even predict to what extent results on a new dataset will be similar to those reported on the evaluation data (both in terms of overall performance and in terms of types of errors).

As a use case, we will look at hate speech and offensive language detection. We will collaborate with social scientists who want to study online aggressive behavior using the models we build for Dutch and English (and possibly other languages). *Your duties*

You will be working on a PhD on the topic described above. In particular, you will

- carry out reproducible experiments on models for detecting online

verbal aggression

- collaborate on designing a framework for testing the generalizability

of the outcome of the experiments on such models

- investigate to what extent various models are suitable to be used for

social science research based on systematic error analyses and the outcome

of the generalizability results

- write research articles and present your work, which is

to culminate in a successful dissertation, at international conferences


The prospective candidate has a Masters degree (MA/MSc) or equivalent in computational linguistics, or related field (AI, Computer Science with focus on NLP). Candidates from other fields with a strong background in machine learning and knowledge of linguistics can also apply. Solid programming skills are required. The project involves interdisciplinary collaboration. The ability to communicate with researchers from different domains is therefore important. Experience with/knowledge of statistical analysis is a plus.

If you want to be also considered for one of our other PhD positions, then also upload your documents to our Hybrid Intelligence talent pool at* https://bit.ly/HIC-Talentpool <https://bit.ly/HIC-Talentpool> *

Your information will then be shared among the researchers in the consortium, and you may be approached for one of the other positions listed on * www.hybrid-intelligence-centre.nl/jobs <http://www.hybrid-intelligence-centre.nl/jobs>*

*What are we offering? *

A challenging position in a socially involved organization. You will be part of the Hybrid Intelligence Center, a collaboration between top AI-researchers from 6 Dutch Universities. You will receive joint supervision from: - dr. Antske Fokkens (Associate Professor in Computational Linguistics, Vrije Universiteit) - dr. Eric Nalisnick (Assistant Professor in Machine Learning, University of Amsterdam) - dr. Ivar Vermeulen (Associate Professor in Communication Science, Vrije Universiteit)

The salary will be in accordance with university regulations for academic personnel and amounts €2,395 (PhD) per month during the first year and increases to €3,061 (PhD) per month during the fourth year, based on a full-time employment. The job profile: is based on the university job ranking system and is vacant for at least 0.8 FTE.

The appointment will initially be for 1 year. After a satisfactory evaluation of the initial appointment, the contract will be extended for a total duration of 4 years (or 5 years in case of a 0.8 fte contract).


Please apply through the official job application site <https://workingat.vu.nl/ad/phd-hybrid-intelligence-generalizing-nlp-experiments/kvsnvm> and make sure to include your:


CV (including a link to examples of your computational work (e.g.

github), if available)


Motivational letter


Academic transcripts (including the names of courses you took and grades)

For questions about the project, please contact: antske.fokkens at vu.nl Make sure to mention [generalizability] in the email headline

-- dr Antske Fokkens -- Associate Professor Computational Lexicology & Terminology Lab (CLTL) Web & Media Group The Network Institute, VU University Amsterdam

De Boelelaan 1105 1081 HV Amsterdam, The Netherlands -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 13828 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20200814/6ec433b4/attachment.txt>

More information about the Corpora mailing list