[Corpora-List] All English Text Messaging Corpus?

Susana M. Sotillo sotillos at mail.montclair.edu
Mon Apr 11 18:52:52 CEST 2011


I was also wondering about the request that excludes SMS text messages from family, friends, neighbors.  All SMS text messages, except those sent by Police Depts., EMTs, or medical technicians (or drones) could be classified as "personal."  I have a collection of 6000+ text messages but these are all from individuals who donated their "personal" and "work-related" text messages.  As I had mentioned to someone else on the list, I had to delete all personal information (phone Nos., names, embedded phone numbers, or any sensitive information) as required by my institution's IRB guidelines, before undertaking any type of analysis.  Scrubbing the data is very time consuming. One of my informants was a Wall Street trader, but he only donated his "personal" text messages or messages that did not disclose any interesting financial transactions.

If someone is really interested in obtaining a very large sample of SMS text messages in "American" English from a variety of geographic and socioeconomic backgrounds, then he/she ought to contact AT& T or Verizon.  My understanding is that they store all SMS text messages for five years in case someone is accused of making terroristic threats or engaging in illegal inside trading practices.  There might be a cost associated with this.  

----- Original Message ----- From: Vivian Tsang <vyctsang at cs.toronto.edu> Date: Monday, April 11, 2011 11:14 am Subject: Re: [Corpora-List] All English Text Messaging Corpus? To: Corpora list <corpora at uib.no>

Shrug.  I'm writing to support the argument against Rich.   Patent claims have nothing to with text messaging; they're entirely different.   If I asked for information about cookbooks and recipes and someone steered me to their collection of movie reviews, I wouldn't find that helpful at all.
> 
-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 2104 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20110411/6a8d4ec9/attachment.txt>



More information about the Corpora mailing list