[Corpora-List] A corpus of human interaction and a POS tagger

Mark Davies Mark_Davies at byu.edu
Fri May 12 14:23:13 CEST 2017



>> We are looking for a corpus of human everyday interactions (e.g. blogs, forums, twitter) with a reliable POS tagging

or an untagged corpus and a good reliable POS tagger to run on it.

There's about 600 million words from blogs (tagged with CLAWS 7) in the GloWbE corpus:

http://corpus.byu.edu/glowbe/?f=texts

It's available via the web interface and as a download: http://www.corpusdata.org/?

Mark Davies

============================================ Mark Davies Professor of Linguistics / Brigham Young University http://davies-linguistics.byu.edu/

** Corpus design and use // Linguistic databases ** ** Historical linguistics // Language variation ** ** English, Spanish, and Portuguese ** ============================================

________________________________ From: corpora-bounces at uib.no <corpora-bounces at uib.no> on behalf of Eva Kimel <eva.kelman at gmail.com> Sent: Wednesday, May 10, 2017 4:58 AM To: corpora at uib.no Cc: Allon Vishkin Subject: [Corpora-List] A corpus of human interaction and a POS tagger

Hello,

We are looking for a corpus of human everyday interactions (e.g. blogs, forums, twitter) with a reliable POS tagging or an untagged corpus and a good reliable POS tagger to run on it.

If anyone has any recommendation - we would be very happy to hear :)

With kind regards,

-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- Eva Kimel

Ph.D. Candidate ELSC, The Hebrew University of Jerusalem -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 4290 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20170512/3ce6b898/attachment.txt>



More information about the Corpora mailing list