[Corpora-List] A corpus of text messages

Tao Chen taochen at comp.nus.edu.sg
Wed Apr 18 10:36:59 CEST 2012

Hi Thapelo, all:

Greeting from NUS. My name is Tao Chen, a second year Ph.D. student working on SMS corpus collection.

Currently we have collected 41,317 English SMS and 29, 533 Chinese SMS, and have released the corpus and its summary statistics on our corpus website.


Also, we have written a technical report about our efforts of data collection, as well as a comprehensive literature review on the existing SMS corpora. You could check out the paper at http://arxiv.org/abs/1112.2468. (Thanks to Nancy for the pointer!)

Our corpus is still a live project. As such, we encourage you and community members interested, to contribute to the corpus. Please go to our corpus website for more information about the contribution.


Tao Chen on behalf of the Web IR / NLP Group (WING) at NUS http://www.comp.nus.edu.sg/~taochen/ -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 1632 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20120418/4bf08eb1/attachment.txt>

More information about the Corpora mailing list