[Corpora-List] A corpus of text messages

Dr. Thapelo J. Otlogetswe otlogetswe at gmail.com
Fri Apr 20 05:26:30 CEST 2012


Thanks colleagues for the excellent responses on my query. Your responses and links have helped clarify a number of issues which which we were grappling with. If we have follow up questions I will be contacting some of your privately...

Thanks again Thapelo

On 18 April 2012 10:36, Tao Chen <taochen at comp.nus.edu.sg> wrote:
> Hi Thapelo, all:
>
> Greeting from NUS. My name is Tao Chen, a second year Ph.D. student
> working on SMS corpus collection.
>
> Currently we have collected 41,317 English SMS and 29, 533 Chinese SMS,
> and have released the corpus and its summary statistics on our corpus
> website.
>
> http://wing.comp.nus.edu.sg/SMSCorpus/
>
> Also, we have written a technical report about our efforts of data
> collection, as well
> as a comprehensive literature review on the existing SMS corpora. You could
> check
> out the paper at http://arxiv.org/abs/1112.2468.  (Thanks to Nancy for the
> pointer!)
>
> Our corpus is still a live project.  As such, we encourage you and community
> members
> interested, to contribute to the corpus. Please go to our corpus website for
> more information
> about the contribution.
>
> Sincerely,
>
> Tao Chen
> on behalf of the Web IR / NLP Group (WING) at NUS
> http://www.comp.nus.edu.sg/~taochen/
>
>

-- • Dr. Thapelo J. Otlogetswe • http://otlogetswe.wordpress.com • "He is no fool who gives what he cannot keep to gain that which he cannot lose." - Jim Elliot



More information about the Corpora mailing list