The Second International Chinese Word Segmentation Bakeoff
Preliminary Description and Important Dates

1. Introduction

This is the initial announcement for the Second International Chinese
Word Segmentation Bakeoff, sponsored by the Special Interest Group for
Chinese Language Processing (SIGHAN) of the Association for
Computational Linguistics. The bakeoff will occur over the summer of
2005 and the results will be presented at the 4th SIGHAN Workshop,
to be held at IJCNLP'05, October 14-15.

The first bakeoff, held in 2003 and presented at the 2nd SIGHAN
Workshop at ACL 2003 in Sapporo, has become the pre-eminent measure
for Chinese word segmentation evaluation and has been cited in
numerous papers. As with the first evaluation, the second bakeoff will
concentrate exclusively on Word Segmentation. Corpora from the
following organizations will be available for use:

- CKIP, Academia Sinica, Taiwan
- City University of Hong Kong, Hong Kong SAR
- CIS Department, University of Pennsylvania, United States
- Beijing Universty, China
- Microsoft Research, China

The exact nature of the segmentation tasks is being discussed and
final details will be made available when registration is opened on 1
June 2005.

Participants are required to submit a short paper describing their
system and analyzing their performance, and present a summary at the
workshop. The reports will be published in the SIGHAN workshop

The language of the workshop is English. Papers must be submitted and
presented in English. Note that unlike the workshop proper, there will
not be a peer review process on the bakeoff reports.

2. Important Dates

2005-06-01 Registration Open
2005-06-29 Training data made available
2005-07-27 Testing data made available
2005-07-29 Test results sent back to organizers
2005-08-05 Results privately reported to participants
2005-08-19 Final reports due from participants

3. Contact Information

The workshop is being organized by Tom EMERSON of Basis Technology
Corp. and Jianfeng GAO of Microsoft Research China.

The web page for the competition is


Questions on the bakeoff should be addressed to Tom Emerson,
tree at basistech.com.

Tom Emerson Basis Technology Corp.
Software Architect http://www.basistech.com
