[Corpora-List] Fully Funded PhD fellowship in computer science: Text mining for the systematic survey of diagnostic tests

Aurélie Névéol aneveol at gmail.com
Mon Jun 13 14:50:44 CEST 2016

*3-year fully funded Marie Curie PhD fellowship: Text mining for the systematic survey of diagnostic tests*

The Information, WrittEN and Signed Language (ILES) group at the Computer Science Laboratory for Mechanics and Engineering Sciences (LIMSI) of CNRS in Paris is inviting applications for a PhD fellowship funded by the European project MiRoR: "Methods in Research on Research". Part of the Marie Sklodowska Curie actions (ITN-EJD), MiRoR involves several renowned teams <http://miror-ejd.eu/beneficiary/>throughout Europe and will provide a unique environment for conducting high level PhD research.

We are looking for a PhD candidate for the *ESR12 project*, which addresses computer science methods with an application to the field of biomedicine: Text mining for the systematic survey of diagnostic tests in published and unpublished literature. The full description of the project is available here <http://miror-ejd.eu/individual-research-projects/>.

The selected candidate will:

*Be based at the CNRS in Paris, with secondments at the University of Amsterdam and at Cochrane

*Be employed full-time for three years, starting in October 2016

*Receive a double doctorate from the University Paris Saclay and the University of Amsterdam

* Benefit from the participation in network-wide meetings and training activities.

*The ideal applicant has:*

* Master degree in Computer Science, Computational Linguistics or related field;

* Understanding of methods in machine learning and computational linguistics;

* Interest in acquiring knowledge of the biomedical domain, if not some prior experience with the domains of clinical research or medicine;

* Prior experience in research projects;

* Good communication and writing skills.

*Eligibility requirements: *

* Applicants must have a Master degree and have not been awarded a doctoral degree at the time of recruitment (October 2016);

* Applicants must have less than 4 years full-time research experience (measured from the date of the diploma recognized to officially enroll in a PhD);

*Applicants must not have resided or carried out their main activity (work, studies, etc.) in France for more than 12 months in the 3 years immediately prior to the recruitment. Short stays such as holidays and/or compulsory national service are not taken into account.

To express interest, please submit a CV, a motivation letter (with an indication of research background and interests), copies of Master diploma and recent transcripts and contact information of two referees at the following address: http://miror-ejd.eu/submission-for-applicant/.

Informal enquiries can be made to aurelie.neveol at limsi.fr

*Deadline: June 26, 2016.*

*Project description*


In medicine, Diagnostic Tests are any kind of tests performed to assist clinicians with the diagnosis of a disease or detection of a specific health condition. Diagnostic tests can be invasive (e.g. amniocenteses), minimally invasive (e.g. blood test) or non-invasive (e.g. urine analysis). Therefore, it is crucial to weigh possible benefits against the financial and psychological burden associated with tests and resulting follow-up. The Cochrane collaboration has developed a methodology to systematically review diagnostic test accuracy based on the published literature [1].

Diagnostic Tests are a major influence on clinical decisions. Yet, information on their utility and accuracy is not always readily available in the published literature. In addition to the delay in publishing reports on diagnostic test studies, there is a strong publication bias: diagnostic test studies are first presented in the major medical conferences, and only about 50% of studies go on to be published in medical journals. The exact nature of the publication bias is not well understood.

Another challenge when looking for diagnostic test accuracy studies, is that these studies are difficult to identify and retrieve [2]. Hence, a typical search strategy for diagnostic test accuracy will retrieve around 5000 initial hits, of which a couple of hundred will have to be read as full text and only around 10 to 20 will be included in the review.

There is a need to understand and monitor the information trail on diagnostic tests in order to inform clinicians and patients. The hypothesis explored in this research project is that text mining methods [3] can offer an efficient way to gather information on diagnostic tests based on medical conference abstracts and articles published in medical journals.


This Ph.D. project aims to:

· Provide the scientific community with tools that track information about diagnostic test studies. For this purpose,

· Develop Natural Language Processing techniques for (a) identifying conference abstracts and journal articles reporting on diagnostic tests (b) extracting specific information about diagnostic tests (study characteristics) including test description, accuracy and use

· Implement a recommendation system that will retrieve diagnostic test studies for inclusion in systematic reviews, as well as identify specific study characteristics

· Populate a knowledge base with comprehensive information about diagnostic tests which will inform researchers and clinicians.

· Update automatic predictions over time using the supervised data obtained through interactive annotation


This project will use a selection of Cochrane diagnostic test accuracy reviews to build a large reference corpus of conference abstracts and journal articles relevant to test diagnostic studies. Furthermore, the dataset will also be annotated with gold standard characterization for test description, accuracy and use.

A classification tool will be developed both for experts to independently identify relevant test diagnostic studies and for systematic review writers, intended to assist them with the selection of studies to be included in the reviews. Once a representative corpus has been obtained, a contrastive study of diagnostic test study reporting in conferences vs. journals will be conducted. Similarly, an information extraction module will be developed to extract specific information related to test diagnosis studies, in collaboration with systematic review writers who will act as experts and end-users.

Annotated data will be used to learn machine learning modules to automatically detect diagnostic test studies and their characteristics. The problems lie in the domain of information extraction and may be approached as named entity recognition informing classification models. The corresponding machine learning modules will be used to make automatic predictions for global studies and in the context of systematic review writing. Issues regarding the efficient integration of the tool within the systematic review creation workflow will be studied and the effects of the tool on systematic review literature coverage will be conducted.

*Expected results*

The resulting algorithms can be applied to the available data sources in order to create a comprehensive repository of information on diagnostic test studies. We expect that this effort will provide useful information for database curators by providing support to database curation, and to clinicians and systematic review writers by providing a comprehensive characterization of studies addressing specific diagnostic tests.


[1] Leeflang MM, Deeks JJ, Takwoingi Y, Macaskill P. Cochrane diagnostic test accuracy reviews. Systematic Reviews. 2013;2:82. doi:10.1186/2046-4053-2-82.

[2] Petersen H, Poon J, Poon SK, Loy C. Increased workload for systematic review literature searches of diagnostic tests compared with treatments: challenges and opportunities. JMIR Med Inform. 2014 May27;2(1):e11.

[3] Manning C and Schütze H. Foundations of Statistical Natural Language Processing, MIT Press. Cambridge, MA: May 1999.

-- -- Aurélie Névéol, PhD

LIMSI-CNRS Bâtiment 508, bureau RdC010 Rue John von Neumann Université Paris-Sud 91403 ORSAY France

Tel: +33 (0)1 69 85 80 10 -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 18302 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20160613/575dc179/attachment.txt>

More information about the Corpora mailing list