The goal of this task is to extract the spans of medications in tweets. The dataset consists of all tweets posted by 212 Twitter users. This data represents the natural and highly imbalanced distribution of drug mentions on Twitter, with only approximately 0.2% of the tweets mentioning a medication. Training and evaluating a sequence labeler on this data set will closely model the detection of drugs in tweets in practice. See BioCreative - Task 3 <https://biocreative.bioinformatics.udel.edu/tasks/biocreative-vii/track-3/> for more information.
In short: • 212 Twitter user timelines (~200,000 tweets) annotated with the spans of medications (timelines have their natural and highly imbalanced distribution, ~0.2% of the tweets mention a medication) • Additional balanced dataset of tweets available (4,975+/4,648-) • Baseline system available • Codalab opened at https://competitions.codalab.org/competitions/23925 • Evaluation period: Sept. 1st, 9:00 UTC - Sept. 4th, 23:59 UTC
[Apologies for cross-posting]
Best regards, Davy -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 1316 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20210608/e87d58fb/attachment.txt>