We have a YouTube threat corpus at UiO/HiOA* with sentence-level annotations of threats of violence (or sympathies of threats of violence). The data set comprises 28,643 sentences across 9,845 comments for 8 different YouTube videos. Please do get in touch should you be interested in research on this.
* The corpus is jointly maintained by the Language Technology Group at the University of Oslo, and the Oslo and Akershus University College of Applied Sciences. The first version of the data set was compiled by Hugo Lewi Hammer and described in "Detecting threats of violence in online discussions using bigrams of important words" (2014), while more recent work will be presented in a an upcoming paper at the WASSA workshop at NAACL-HLT 2016; "Threat detection in online discussions" by Aksel Wester, Lilja Øvrelid and Erik Velldal.
On Thu, Apr 28, 2016 at 2:01 PM, Mario Crespo Miguel <mario.crespo at uca.es> wrote:
> Dear members of corpora list,
> I wonder if you could give me good advice about a corpus/corpora of threat
> or/and hate speeches (spoken or written). Languages we are interested in
> are English, Spanish, French, Portuguese, Italian or Arabic.
> thank you very much in advance,
> Mario Crespo
> [image: UCA]
> Mario Crespo Miguel
> Profesor Sustituto Interino
> *Mario Crespo Miguel Área de Lingüística Departamento de Filología*
> Universidad de Cádiz
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> Erik Velldal
> Associate professor
> Language Technology Group
> Department of Informatics, University of Oslo
-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 3557 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20160428/12d758c5/attachment.txt>