[Corpora-List] All English Text Messaging Corpus?

John.Osborne at univ-savoie.fr John.Osborne at univ-savoie.fr
Mon Apr 11 17:03:08 CEST 2011

Dear Laura, The National University of Singapore is working on a corpus of SMS messages. See here for info and consultation/download of corpus to date: http://wing.comp.nus.edu.sg:8080/SMSCorpus/

You might also like to have a look at the sms4science project. See here for the English version of their site: http://www.sms4science.org/?q=en


> On 4/11/2011 9:43 AM, Khurshid Ahmad wrote:
>> Dear Laura
>> I am writing to support Rich.
>> The USPTO documents are in Legal English and are written by Patent
>> Attorneys -these documents form a representative sample of American
>> English and to a lesser extent that of other national varieties of written
>> English.
> Uh, people? Laura asked for a text messaging corpus. As in
> short, presumably spontaneous messages. I think her query was very
> interesting, because we know that people use text and chat for many
> purposes besides personal ones. Such a corpus (preferably with
> timing data) would enable us to differentiate between variation due
> to formality and variation due to spontaneity and time pressure.
> Rich hijacked that query to promote patents (which is a
> completely different text type) and his own software project (which
> is not remotely necessary to analyze the publicly available patent
> database). Of course patents are worthwhile objects of study, but
> they're not text messages, and Rich's response was essentially spam.
> If anyone wants to promote their own project or suggest an area
> that deserves further study, go ahead, but start your own thread.
> --
> -Angus B. Grieve-Smith
> grvsmth at panix.com
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora

More information about the Corpora mailing list