[Corpora-List] Datasets

Firoj Alam firojalam at gmail.com
Mon Jul 19 21:48:05 CEST 2021

Dear Colleagues,

We would like to offer you two interesting social media datasets, which can help to 1) evaluate your new machine learning model, 2) design models to facilitate crisis responders with humanitarian information processing.

1. HumAID: We provide a new large-scale dataset with ~77K human-labeled tweets, sampled from a pool of ~24 million tweets across 19 disaster events that happened between 2016 and 2019.

"HumAID: Human-Annotated Disaster Incidents Data from Twitter with Deep Learning Benchmarks"

paper: https://arxiv.org/abs/2104.03090

dataset: https://crisisnlp.qcri.org/humaid_dataset, https://doi.org/10.7910/DVN/A7NVF7

2. CrisisBench: We combine various existing crisis-related datasets. We consolidate eight human-annotated datasets and provide 166.1k and 141.5k tweets for informativeness and humanitarian classification tasks, respectively.

"CrisisBench: Benchmarking Crisis-related Social Media Datasets for Humanitarian Information Processing"

paper: https://arxiv.org/abs/2004.06774

dataset: https://crisisnlp.qcri.org/crisis_datasets_benchmarks, https://doi.org/10.7910/DVN/G98BQG



Firoj Alam, PhD

http://sites.google.com/site/firojalam/ -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 6563 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20210719/cb625dfa/attachment.txt>

More information about the Corpora mailing list