[Corpora-List] Announcement of Data Release and Call for Participation

nlp06 at albany.edu nlp06 at albany.edu
Thu Feb 7 19:20:47 CET 2008

Announcement of Data Release and Call for Participation

Second i2b2 Shared Task and Workshop

Challenges in Natural Language Processing for Clinical Data

Informatics for Integrating Biology and the Bedside, i2b2, a National Center for Biomedical Computing, is ready to release fully de-identified discharge records for its Second shared task!

In collaboration with State University of New York at Albany, MIT Computer Science and Artificial Intelligence Laboratory, and Partners Healthcare System, i2b2 is pleased to announce the Second Shared Task and Workshop on Challenges in Natural Language Processing for Clinical Data.

Data Release and Preliminary Call for Participation.

The Second i2b2 Shared Task on Challenges in Natural Language Processing for Clinical Data opens to preregistration on February 1, 2008. The 2008 Challenge is a multi-class, multi-label classification task focused on obesity and its co-morbidities. The data for the challenge consists of discharge summaries from Partners Healthcare. All records have been fully de-identified and annotated for obesity and co-morbidities.

Training data for the 2008 Challenge will be released in installments; first installment will be released on March 15, 2008. The rest of the installments will follow soon after. Test data is scheduled to be released, for only three days, and will be used for only evaluation purposes. The results of the shared-task challenge will be presented at the workshop organized by i2b2 (Date and location are TBA).

Data will be released under a Data Use Agreement and is to be used for the Challenge only. Obtaining the data requires completing a preregistration and signing the Data Use Agreement. All members of a team are requested to sign the Data Use Agreement.

Evaluation Dates, File Formats, and Evaluation Metrics.

The Obesity challenge evaluation will be on only the test data. The participating teams are asked to stop development as soon as they download the test data. System output on the test data is to be returned to the organizers for evaluation through this website within three days of test data release. Each team is allowed to upload upto three system runs. System output is expected only in the form of standoff annotations, following the exact format of the ground truth annotations provided by i2b2. We are unable to evaluate output that does not comply with this standard. Precision, recall, and f-measure (Beta = 1) computed per class will be used as evaluation metrics.

Participants are asked to submit a short paper describing their system and analyzing their performance. Papers should be in AMIA style and should not exceed five pages. Authors of top performing systems and of particularly novel approaches will be invited to present or demo their systems at the workshop. All submissions will be considered for publication at a special issue of JAMIA.

Tentative Schedule February 1, 2008 Preregistration Open March 15, 2008 Training Data Release April 15, 2008 Commitment to Participate in Challenge June 23, 2008 Test Data Release at 9am EST June 25, 2008 Output Due at Midnight EST August 1, 2008 Notification of Results to Each Participant September 1, 2008 Final Reports Due October 1, 2008 Invitations to Present at the Workshop November, 2008 Workshop (Pending approval by AMIA)

Organizing Committee:

Ozlem Uzuner, SUNY at Albany, Chair Peter Szolovits, MIT CSAIL Isaac Kohane, Partners Healthcare

Please see the FAQs and announcements (see side bar) for more information. Questions on the shared task should be addressed to Ozlem Uzuner, i2b2nlp at albany.edu.

More information about the Corpora mailing list