[Corpora-List] DEFT French NLP challenge: call for participation

Cyril Grouin cyril.grouin at limsi.fr
Thu Feb 13 16:20:33 CET 2020

*Défi Fouille de Textes (DEFT) - Text Mining Challenge in French** * https://deft.limsi.fr/2020/ (https://deft.limsi.fr/2020/index-en.html)

Further to the DEFT 2019 challenge, the 2020 issue of the challenge défi fouille de textes (DEFT 2020) continues to explore clinical cases written in French. This new issue addresses fine-grained information extraction from the dozen categories (similar to international challenges I2B2 2009, 2012 and 2014, and SemEval 2014). In addition to the clinical area, we also propose two new tasks dedicated to semantic similarity between sentences.

* **Global information on corpora*

One corpus used in this challenge is part of a larger corpus with clinical cases, with more complete annotations and associated information [1]. The clinical cases are related to various medical specialties (cardiology, urology, oncology, obstetrics, pulmonology, gastro-enterology...). They have been published in different French-speaking countries (France, Belgium, Switzerland, Canada, African countries, tropical countries...).

Another corpus used is part of the CLEAR corpus [2]. The CLEAR corpus contains three sub-corpora (encyclopedia articles, drug leaflets, Cochrane summaries) of documents with comparable contents. Each corpus provides technical and simple/simplified texts on a given topic in French. Sentences for Tasks 1 and 2 are extracted from this corpus.

The reference data are consensual and obtained from two independent annotations.

[1] N Grabar, V Claveau, C Dalloux. CAS: French Corpus with Clinical Cases. LOUHI 2018, p. 1-7 [2] N Grabar, R Cardon. CLEAR -- Simple Corpus for Medical French. ATA 2018, p 1-7

* **The following tasks are part of this challenge:*

- Task 1: identify the degree of similarity between pairs of parallel and non-parallel sentences from several areas

Purpose: define the similarity level between two sentences, on a scale going from 0 to 5 Input: pairs of sentences Output: similarity level between 0 and 5 for each pair of sentences Evaluation: difference between the provided value and the reference value

- Task 2: identify possible parallel sentences for a given source sentence

Purpose: for a given source sentence and several candidates, identify the parallel sentence among the candidates Input: one source sentence and several candidates Otput: parallel sentence corresponding to the source sentence Evaluation: boolean

- Task 3: information extraction

Purpose: detect, in the clinical cases, fine-grained information related to a dozen of categories.

Four domains are covered: . related to patients: anatomy . related to clinical pratice: tests, pathologies, signs and symptoms . related to drugs and surgery: substance, dosage, duration, frequency, mode of administration, treatment, value . related to temporality: date, moment The annotation guide is available online: https://deft.limsi.fr/2020/guide-deft.html

Input: a set of clinical cases Output: information corresponding to the categories aimed in each clinical case Evaluation: information category and text spans compared with the reference data

* **Important dates:* - Registration: starting from January 27th, 2020 and up to the beginning of the test period - Release of the training data: January 27th, 2020 - Test: 3 days to chose between April 24th and 30th - Submission of papers (strict deadline): May 4th (first version), May 8th (final version) - Workshop: June 8th or 9th, 2020 during JEP-TALN in Nancy, France

Access to the data from DEFT 2020 is only possible after the user agreement is signed by all the team members (https://deft.limsi.fr/2020/accord-deft2020.pdf). The participants can engage in one or more tasks. When getting access to the data, the participants are committed to submit the resuts for one task at least and to present the results during the workshop.

-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 4992 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20200213/e6787184/attachment.txt>

More information about the Corpora mailing list