We would like to offer you two interesting social media datasets, which can help to 1) evaluate your new machine learning model, 2) design models to facilitate crisis responders with humanitarian information processing.
1. HumAID: We provide a new large-scale dataset with ~77K human-labeled tweets, sampled from a pool of ~24 million tweets across 19 disaster events that happened between 2016 and 2019.
"HumAID: Human-Annotated Disaster Incidents Data from Twitter with Deep Learning Benchmarks"
paper: https://arxiv.org/abs/2104.03090
dataset: https://crisisnlp.qcri.org/humaid_dataset, https://doi.org/10.7910/DVN/A7NVF7
2. CrisisBench: We combine various existing crisis-related datasets. We consolidate eight human-annotated datasets and provide 166.1k and 141.5k tweets for informativeness and humanitarian classification tasks, respectively.
"CrisisBench: Benchmarking Crisis-related Social Media Datasets for Humanitarian Information Processing"
paper: https://arxiv.org/abs/2004.06774
dataset: https://crisisnlp.qcri.org/crisis_datasets_benchmarks, https://doi.org/10.7910/DVN/G98BQG
Best
................
Firoj Alam, PhD
http://sites.google.com/site/firojalam/ -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 6563 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20210719/cb625dfa/attachment.txt>