[Corpora-List] Call for Participation: COVIDSearch

Kirk Roberts kirkroberts at gmail.com
Wed Mar 25 02:03:53 CET 2020

Researchers, clinicians, and policy makers involved with the response to COVID-19 are constantly searching for reliable information on the virus and its impact. This presents a unique opportunity for the information retrieval (IR) and text processing communities to contribute to the response to this pandemic, as well as to study methods for quickly standing up such systems for similar future events.

Recently, the Allen Institute for AI <https://allenai.org/> and collaborators announced the availability of an open dataset, the COVID-19 Open Research Dataset (CORD-19) <https://pages.semanticscholar.org/coronavirus-research>. This collection of biomedical literature articles currently contains over 40,000 articles and will be updated weekly.

We are announcing an IR challenge for search engines that find relevant COVID-related articles within this collection. This challenge will provide:

- A benchmark set of important COVID-related queries (e.g., "coronavirus

risk factors", "COVID-19 ibuprofen")

- A set of manual judgments for CORD-19 articles on these queries

- An ongoing leaderboard for comparison of IR systems

The challenge may in the future expand to more detailed tasks such as information-filtering, question-answering, fact-checking, and argument mining.

The current plan is to run the competition in weekly batches, where that week's snapshot of CORD-19 is used as the corpus and the results of systems participating in that batch are pooled for manual assessment. The task will follow the "Cranfield" evaluation procedures that are used in the Text Retrieval Conference (TREC) <https://trec.nist.gov/> and related challenge evaluations.

One of the ways we will build topics for the test collection will be to solicit them by crowd-sourcing on Twitter. Please reply to our tweets using the hashtag, *#COVIDSearch*. (We will assess all nominations and incorporate those that best fit the task.)

The goal of this retrieval challenge is both to help develop systems capable of identifying relevant information for the current pandemic, but also to scientifically study how retrieval methods can be quickly developed for such situations in the future.

Participants in this project include:

- Ian Soboroff, National Institute for Standards & Technology (NIST)

- Ellen Voorhees, National Institute for Standards & Technology (NIST)

- Dina Demner-Fushman, National Library of Medicine

- William Hersh, Oregon Health & Science University

- Kirk Roberts, University of Texas Houston Health Science Center

- Lucy Lu Wang, Allen Institute for AI

- Kyle Lo, Allen Institute for AI

- Steven Bedrick, Oregon Health & Science University

- Aaron Cohen, Oregon Health & Science University

Initial (working) webpage: https://dmice.ohsu.edu/hersh/COVIDSearch.html

More to follow soon! -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 6139 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20200324/641ab2cc/attachment.txt>

More information about the Corpora mailing list