[Corpora-List] [CFP] Test set release: CL-Scientific Summarization @ SIGIR 2018. Last call!

#KOKIL JAIDKA# KOKI0001 at e.ntu.edu.sg
Tue May 1 14:42:52 CEST 2018


The 4th Computational Linguistics Scientific Summarization Shared Task, CL-SciSumm-18 @ SIGIR 2018: http://wing.comp.nus.edu.sg/~cl-scisumm2018/

and the associated Bibliometrics and IR workshop: http://wing.comp.nus.edu.sg/~birndl-sigir2018/

Updates: 1. Test set released! 20 citation-reference sets are now up in the Github repository. 2. Invited Speaker: Dr. Byron Wallace, Northeastern University. (http://www.byronwallace.com/)

The Shared Task comprises three sub-tasks in automatic research paper summarization on a new corpus of research papers. This task

is expected to be of interest to a broad community including those

working in CL and NLP, especially in the sub-disciplines of text

summarization, natural language generation, text reuse, discourse

structure in scholarly discourse, paraphrase, textual entailment

and text simplification.

=== The Task ===

Given: A topic consisting of a Reference Paper (RP) and ten or more

Citing Papers (CPs) that all contain citations to the RP. In each CP,

the text spans (i.e., citances) have been identified that pertain to a

particular citation to the RP.

Task 1a: For each citance, identify the spans of text (cited text

spans) in the RP that most accurately reflect the citance. These are

of the granularity of a sentence fragment, a full sentence, or several

consecutive sentences (no more than 5).

Task 1b: For each cited text span, identify what facet of the paper it

belongs to, from a predefined set of facets.

Task 2: (optional bonus task): Finally, generate a structured summary of the RP from the cited text spans of the RP. The length of the summary

should not exceed 250 words.

=== Important Dates ===

March 19: Training set posted

May 1: Test set posted

May 4: Extended deadline for short system descriptions

May 20: System runs from the test set due

May 27: System reports (paper) due

June 25: Camera ready contributions due

July 12, 2018: Participants present at the BIRNDL 2018 workshop in Ann Arbor, MI, USA

=== The Corpus ===

The CL-SciSumm corpus is created by randomly sampling documents from

the ACL Anthology corpus and selecting their citing papers. Citing paper may

Include papers from outside the Anthology. For

CL-SciSumm 2018, we have selected three portions of this source

collection to be annotated and serve as training, development and test

collections. The training set of articles is available for download

at GitHub <https://github.com/WING-NUS/scisumm-corpus> and can be used

by participants to pilot their systems. Watch for updates to the

GitHub repository, as we will be updating the repository with announcements

and new files. The system outputs from the test set should be submitted to

the task organizers, for the collation of the final results to be presented at

the workshop.

=== Registration ===

Organizations wishing to participate in the CL Shared Task track at

BIRNDL 2018 are invited to register on EasyChair:

<https://easychair.org/conferences/?conf=birndl2018> by April 8th with

a tentative abstract. Please prefix “CLSciSumm Shared Task: ” to the

title of your submission. Participants are advised to register as soon as

possible in order to receive timely access to evaluation resources,

including training development and testing data. Registration for the

task does not commit you to participation - but is helpful to know for

planning. All participants who submit system runs are welcome to

present their systems as posters/selected presentations at the

BIRNDL 2018 Workshop at Ann Arbor, MI, USA.

-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 5999 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20180501/68384654/attachment.txt>

More information about the Corpora mailing list