[Corpora-List] Announcement of Data Release and Call for Participation

Ozlem Uzuner ozlem
Mon Apr 6 12:45:42 CEST 2009

Announcement of Data Release and Call for Participation

Third i2b2 Shared-Task and Workshop Challenges in Natural Language Processing for Clinical Data Medication Extraction Challenge

Data Release: 1 June, 2009 Evaluation: August, 2009 Paper Submission: 1 September, 2009 Workshop: November, 2009 in San Francisco, CA URL: i2b2.org/NLP

********* Register NOW at i2b2.org/NLP **********

Organizer: Informatics for Integrating Biology and the Bedside, i2b2, a National Center for Biomedical Computing

Medication extraction challenge aims to encourage development of natural language processing systems for the extraction of medication-related information from narrative patient records. Information to be targeted includes medications, dosages, modes of administration, frequency of administration, and the reason for administration. In order to encourage the development of semi- and un-supervised systems for medication extraction, the development data for the medication extraction challenge will be distributed unannotated. Participants will be allowed to create their own annotations. For this purpose, annotation guidelines and sample annotated records will be provided.

The challenge opens to registration on April 1, 2009. Development data for the challenge will be released in June. Test data are scheduled to be released for only three days and will be used only for evaluation purposes. The results of the challenge will be presented at the workshop organized by i2b2.

Data for the medication extraction challenge will be released under a Data Use Agreement. Obtaining the data requires completing a registration and signing the Data Use Agreement. Downloading the data implies commitment on the part of the downloading team to participate in the medication extraction challenge. Data can be kept and used for research purposes beyond the duration of the challenge.

Evaluation Dates, File Formats, and Evaluation Metrics.

The medication extraction challenge is inspired by the Question Answering track of Text Retrieval Evaluation Conference (TREC) of NIST. Following the standards of NIST, evaluation will be on the test data and evaluation metrics will resemble those of NIST. Participating teams are asked to stop development as soon as they download the test data. Each team is allowed to upload (through this website) up to three system runs. System output is expected in the form of standoff annotations, following the exact format of the ground truth annotations to be provided by i2b2.

Test data will be annotated by the challenge participants. After uploading their system outputs to the i2b2 website, each team will be asked to annotate 10 records/person. Multiple annotations for each record will be obtained before finalizing the ground truth. Downloading the training data constitutes commitment on the part of the challenge participants to annotate 10 records/person from the test data.

Participants are asked to submit a short paper describing their system and analyzing their performance. Papers should be in AMIA style and should not exceed five pages. Authors of top performing systems and of particularly novel approaches will be invited to present or demo their systems at the workshop. A journal special issue will be organized for a subset of the top ten systems.

Tentative Schedule April 1, 2009 Registration Open June, 2009 Development Data Release August, 2009 Test Data Release at 9am EST October, 2009 Notification of Results to Each Participant November, 2008 Workshop

Organizing Committee:

Ozlem Uzuner, Chair, SUNY at Albany Middle East Technical University Northern Cyprus Campus Imre Solti, University of Washington Peter Szolovits, MIT CSAIL Isaac Kohane, Partners HealthCare

Please see the FAQs and announcements for more information. Questions on the challenge can be addressed to Ozlem Uzuner, i2b2nlp at albany.edu.

More information about the Corpora mailing list