[Corpora-List] Social Media Corpora collected during 2013-2015

LEE, kayee [12118192d] kayee.lee at connect.polyu.hk
Mon Sep 7 15:51:29 CEST 2015

Thanks Rob for providing so many useful links.It would be of valuable asset to my lexical study which is interested in the new usages, new spellings and new words in social media.

If anyone is doing the similar study as mine, you can go through the links which provided below.

I am looking for a few more social media corpus If you know any other social media corpus collected during 2013-2015 and are compiled in English, please let me know.


Kayee LEE

________________________________ 寄件者: rob van der goot <robvanderg at live.nl> 寄件日期: 2015年8月31日 上午 12:53 收件者: LEE, kayee [12118192d] 主旨: RE: [Corpora-List] Looking for a social media corpora collected in 2013-2015 (Kayee LEE KA LAM)

Deat Kayee,

Those files move all the time, I got the updated links here: Lexnorm, (is old, before 2013 I think) <http://people.eng.unimelb.edu.au/tbaldwin/etc/lexnorm_v1.2.tgz>http://people.eng.unimelb.edu.au/tbaldwin/etc/lexnorm_v1.2.tgz Lexnorm 2015, is not in the overview, but is newer. https://noisy-text.github.io/files/lexnorm2015.tgz I think the sms messages are also from before 2013, but if you are still interested: http://www.comp.nus.edu.sg/~nlp/corpora.html Pos-tagged tweets: bit.ly/twitter-bootstrap-corpus

Another interesting corpus might be the encow corpus (from 2014). https://webcorpora.org/ Or you can always collect you own tweets, https://dev.twitter.com/rest/public

For some of the corpora you do have to contact the creators.

Good luck with them, Rob van der Goot



This message (including any attachments) contains confidential information intended for a specific individual and purpose. If you are not the intended recipient, you should delete this message and notify the sender and The Hong Kong Polytechnic University (the University) immediately. Any disclosure, copying, or distribution of this message, or the taking of any action based on it, is strictly prohibited and may be unlawful.

The University specifically denies any responsibility for the accuracy or quality of information obtained through University E-mail Facilities. Any views and opinions expressed are only those of the author(s) and do not necessarily represent those of the University and the University accepts no liability whatsoever for any losses or damages incurred or caused to any party as a result of the use of such information. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 9251 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20150907/8629c84f/attachment.txt>

More information about the Corpora mailing list