[Corpora-List] Looking for a gold standard dataset of computer science articles to benchmark an IR system

Bahar Sateli sateli at semanticsoftware.info
Thu Jan 5 19:57:50 CET 2017

Dear Corpora-List,

I am looking for any available datasets of scientific literature in the computer science domain (excluding biomedical documents) with a set of corresponding queries to help me evaluate our research on semantic retrieval of scientific literature.

Ideally, we’d prefer datasets of open-access articles that we can subsequently re-distribute with semantic annotations. We are already aware of the datasets like ACL and CORE, but as far as I understood, they do not contain any gold standard information.

Very Best,


More information about the Corpora mailing list