[Corpora-List] Scientific data curator – 24 months – Grenoble, France

François Portet francois.portet at imag.fr
Wed Nov 10 14:33:04 CET 2021

Scientific data curator – 24 months – Grenoble, France

*Starting date:* January 03, 2022 at the earliest

*Duration:* full-time position for 24 months (with a possibility of reappointment)

*Deadline for Applications:* November 30th, 2021

*Location:* The position will be based in Grenoble, France

*Remote work* will be possible, eg: 1 day/week

*Keywords:* Corpus, digital humanities, data collection


The NanoBubbles ERC Synergy project’s objective is to understand how, when and why science fails to correct itself. The project focuses on claims made within the field of nanobiology. Project members combine approaches from the natural sciences, computer science, and the social sciences and humanities (Science and Technology Studies) to understand how error correction in science works and what obstacles it faces. For this purpose, we aim to trace claims and corrections through various channels of scientific communication (journals, social media, advertisements, conference programs, etc.) via both qualitative and digital methods.

Your contribution to the main project will be to advise on, run and/or maintain software and systems that support activity related to collection, analysis, storage and presentation of textual data and metadata.

This is an exciting opportunity to join a highly interdisciplinary research team working at the forefront of Science and Technology Studies, Digital Humanities, ethics of/in research, and nanoscience.

You will:

* *Build corpora* with data collected from heterogeneous sources (eg.:

bibliographic databases like Scopus or Dimensions, full-text

databases like ISTEX or open archive repositories, social networks,

post-publication peer-review platforms and other online tools

allowing annotations and comments…)

* *Process and transform data*, organize data flow to database, create

formal links between datasets.

* *Curate* metadata

* *Develop scripts* for data collection via APIs (preferably: Python,

SQL, Java, R) and web scraping (e.g., HtmlUnit, Selenium)

* Contribute to the development of a *common vocabulary* and map it to

existing ontologies

* Implement and manage various software pipelines to support *data

analysis and text mining*.

* Help the other team members to run experiments and validate their


* *Document the data lifecycle* and update the *data management plan*

You will work closely with PhD students, interns and researchers of the ERC project.

You will also benefit from the skills and the research environment of 2 research units: the LISIS (http://umr-lisis.fr <http://umr-lisis.fr/>) and the LIG (https://www.liglab.fr/en <https://www.liglab.fr/en>).


Master’s degree in data science, digital humanities or computational social sciences.

Very good knowledge of English

Qualifications in corpus linguistics tools, corpus-based research, quantitative and qualitative data analysis, natural language processing or computational linguistics are deemed as a plus.

Instructions for applying

Applications are expected until November 30th, 2021.

Please send CV + letter/message of motivation + grades from previous education + references for potential letter(s) of recommendation to:

Frederique Bordignon (frederique.bordignon at enpc.fr <mailto:frederique.bordignon at enpc.fr>),

Cyril Labbé (cyril.labbe at imag.fr <mailto:cyril.labbe at imag.fr>),

and Cyrus Mody (c.mody at maastrichtuniversity.nl <mailto:c.mody at maastrichtuniversity.nl>).

-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 6651 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20211110/f7ec5caa/attachment.txt>

More information about the Corpora mailing list