Starting date: October 01, 2021
Deadline for Applications: July 5th, 2021
Keywords: natural language processing, citation classification, transfer learning, deep learning
The NanoBubbles ERC project objective is to understand how, when and why science fails to correct itself. The project’s focus is nanobiology and it combines approaches from the natural, computer science, and social sciences and the humanities (Science and Technology Studies) to understand how error correction in science works and what obstacles it faces. For this purpose, we aim to trace claims and corrections in various channels of scientific communication (journals, social media, advertisements, conference programs, etc.) via natural language processing.
The challenge is to build data sets, models and tools that enable organising and analysing the rapidly evolving ecology of online comments complementary to conventional scientific records:
- This means not only counting references to a document but also assessing and leveraging the content of both cited and citing document.
- This means not only identifying named entity, claims and counter claims but also extracting structured knowledge from text.
- This means not only taking advantage of existing data to learn models but also building tools for creation and annotation of new sets of data so to train advance language models.
Citations are an important indicator of the state of a scientific field. They reflect how authors frame their work and influence its adoption by future researchers. However, despite recent work in NLP [Bakhti2018,Jurgens2016,Pride2019,Yu2020], citation behaviour and how it can be used to point out error correction lack large scale and deep citation analyses.
The objective of this PhD is to design new NLP method to detect and qualify citations and extract citation network in scientific research.
[Bakhti2018] Bakhti, K., Niu, Z., Yousif, A., & Nyamawe, A. S. (2018, August). Citation function classification based on ontologies and convolutional neural networks. In International Workshop on Learning Technology for Education in Cloud (pp. 105-115). Springer, Cham.
[Jurgens2016] Jurgens, D., Kumar, S., Hoover, R., McFarland, D., & Jurafsky, D. (2016). Citation classification for behavioral analysis of a scientific field. arXiv preprint arXiv:1609.00435.
[Pride2019] Pride, D., Knoth, P., & Harag, J. (2019, June). ACT: an annotation platform for citation typing at scale. In 2019 ACM/IEEE Joint Conference on Digital Libraries (JCDL) (pp. 329-330). IEEE.
[Yu2020] Yu, W., Yu, M., Zhao, T., & Jiang, M. (2020, April). Identifying referential intention with heterogeneous contexts. In Proceedings of The Web Conference 2020 (pp. 962-972).
Master 2 in Natural Language Processing, computer science or data science.
Programming experience in Python and in a deep learning framework.
Previous experience in NER, RE and dataset manipulation would be a plus.
The thesis will be conducted within the Sigma and Getalp teams of the LIG laboratory (http://sigma.imag.fr/ and https://lig-getalp.imag.fr/). The recruited person will be welcomed within the teams which offer a stimulating, multinational and pleasant working environment. The means to carry out the PhD will be provided both in terms of missions in France and abroad and in terms of equipment (personal computer, access to the LIG GPU servers).
The person will also be required to collaborate with several teams involved in the ERC Nanobubbles project, in particular with researchers from the IRIT lab (Toulouse, France), University of Paris Sorbonne as well as researchers from Maastricht University, Radboud Universiteit and University of Twente based in the Netherlands.
Instructions for applying
Applications are expected until July 5th, 2021. They must contain: CV + letter/message of motivation + master notes + letter(s) of recommendation; and be addressed to Cyril Labbé (cyril.labbe at imag.fr), François Portet (Francois.Portet at imag.fr) and Yasemin J. Erden (y.j.erden at utwente.nl).
Applications will be considered on the fly. It is therefore advisable to apply as soon as possible.