[Corpora-List] ACL Anthology Reference Corpus, Version 2 -- and solicitation for tasks and ideas

Min-Yen Kan knmnyn at gmail.com
Wed Aug 5 09:17:26 CEST 2015

Dear Corpora List members:

(Apologies for the cross-posting)

The Association for Computational Linguistics (ACL) has had a longstanding history of publishing its scholarly works under a permissive license that allows for open source sharing for most purposes. The archives of these works have been available in the ACL Anthology (http://www.aclweb.org/anthology) for any to read and re-use, for a number of years.

To better serve our own community in corpus linguistics, we plan to release a machine readable version with the text and logical document formatting of the articles, for all of the scholarly publications in the ACL Anthology. This should be forthcoming within the next few months, and shall be announced here as well.

At this stage, we would like to solicit ideas for shared tasks or workshop themes that would involve the scholarly materials in the ACL Anthology. Some suggestions have been to hold a task for document retrieval, document summarization, keyphrase extraction or sentiment analysis task.

A significant difficulty is in annotation of ground truth for any of these tasks. Without a funding source, we are planning to ask participants to do pooled annotation of system results, in the style of TREC.

We hope with this post to be able to seed the discussion about such a dataset and task, with the objective of building up a community initiated workshop in the 2016/2017 timeframe.

Thank you for your attention!

- Min-Yen Kan ACL Anthology Editor

More information about the Corpora mailing list