[Corpora-List] Call for papers SDP at COLING2022

Tirthankar Ghosal tirthankar.slg at gmail.com
Fri Apr 29 08:14:36 CEST 2022

Dear colleagues,

You are invited to participate in the 3rd Workshop on Scholarly Document Processing (SDP 2022) to be held at COLING 2022 (October 12-17, 2022). The SDP 2022 workshop will consist of a Research track and six Shared Tasks. The call for research papers is described below, and more details can be found on our website, http://www.sdproc.org/.

Papers must follow the COLING format and conform to the COLING Submission Guidelines. The paper submission site will be provided on the workshop website shortly. The paper submission deadline is July 11, 2022.

Website: http://www.sdproc.org/ Twitter: https://twitter.com/sdproc Mailing list: https://groups.google.com/g/sdproc-updates CfP: https://sdproc.org/2022/cfp.html

** Call for Research Papers **


Although scientific literature plays a major part in research and policy-making, these texts represent an underserved area of NLP. NLP can play a role in addressing research information overload, identifying disinformation and its effect on people and society, and enhancing the reproducibility of science. The unique challenges of processing scholarly documents necessitate the development of specific methods and resources optimized for this domain. The Scholarly Document Processing (SDP) workshop provides a venue for discussing these challenges and bringing together stakeholders from different communities including computational linguistics, text mining, information retrieval, digital libraries, scientometrics, and others to develop and present methods and resources in support of these goals.

This workshop builds on the success of prior workshops: the 1st and 2nd SDP workshops held at EMNLP 2020 and NAACL 2021, and the 1st and 2nd SciNLP workshops held at AKBC 2020 and 2021. In addition to having broad appeal within the NLP community, we hope the SDP workshop will attract researchers from other relevant fields including meta-science, scientometrics, data mining, information retrieval, and digital libraries, bringing together these disparate communities within ACL.

Topics of Interest

We invite submissions from all communities demonstrating usage of and challenges associated with natural language processing, information retrieval, and data mining of scholarly and scientific documents. Relevant tasks include (but are not limited to):

Representation learning Information extraction Summarization Language generation Question answering Discourse modeling and argumentation mining Network analysis Bibliometrics, scientometrics, and altmetrics Reproducibility Peer review Search and indexing Datasets and resources Document parsing Text mining Research infrastructure and others.

We specifically invite research on important and/or underserved areas, such as:

Identifying/mitigating scientific disinformation and its effects on public policy and behavior Reducing information overload through summarization and aggregation of information within and across documents Improving access to scientific papers through multilingual scholarly document processing Improving research reproducibility by connecting scientific claims to evidence such as data, software, and cited claims

** Submission Information **

Authors are invited to submit full and short papers with unpublished, original work. Submissions will be subject to a double-blind peer-review process. Accepted papers will be presented by the authors at the workshop either as a talk or a poster. All accepted papers will be published in the workshop proceedings (proceedings from previous years can be found here: https://aclanthology.org/venues/sdp/).

The submissions must be in PDF format and anonymized for review. All submissions must be written in English and follow the COLING 2022 formatting requirements: https://coling2022.org/Cpapers

We follow the same policies as COLING 2022 regarding preprints and double-submissions. The anonymity period for SDP 2022 is from June 13 to August 22.

Long paper submissions: up to 9 pages of content, plus unlimited references. Short paper submissions: up to 4 pages of content, plus unlimited references.

Final versions of accepted papers will be allowed 1 additional page of content so that reviewer comments can be taken into account.

More details about submissions are available on our website: http://www.sdproc.org/. To receive updates, please join our mailing list: https://groups.google.com/g/sdproc-updates or follow us on Twitter: https://twitter.com/sdproc

** Important Dates (Main Research Track) **

All paper submissions due – July 11, 2022 Notification of acceptance – August 22, 2022, Camera-ready papers due – September 5, 2022 Workshop – October 16/17, 2022

** SDP 2022 Keynote Speakers **

We are excited to have several keynote speakers at SDP 2022. The following speakers have been confirmed (others will be announced later).

Min Yen-Kan, NUS, Singapore (https://www.comp.nus.edu.sg/~kanmy/) Sophia Ananiadou, University of Manchester, UK ( https://www.research.manchester.ac.uk/portal/sophia.ananiadou.html) Andrew Head, University of Pennsylvania, USA (https://andrewhead.info/)

** SDP 2022 Shared Tasks **

SDP 2022 will host six exciting shared tasks. More information about all shared tasks is provided on the workshop website: https://sdproc.org/2022/sharedtasks.html Each shared task will follow up with a separate CfP.

Multi-Perspective Scientific Document Summarization: This shared task will enable exploring methods for generating multi-perspective summaries. We introduce a novel summarization corpus, leveraging data from scientific peer reviews to capture diverse perspectives from the reader's point of view. More information coming soon at: https://github.com/guyfe/Mup

LongSumm 2022: Generation of Long Summaries for Scientific Documents. This shared task leverages blog posts created by researchers in the NLP and Machine learning communities that summarize scientific articles and use these posts as reference summaries. The corpus for this task includes a training set that consists of 1705 extractive summaries and 531 abstractive summaries of NLP and Machine Learning scientific papers. More information at: https://github.com/guyfe/LongSumm

SV-Ident 2022: Survey Variable Identification in Social Science Publications. Survey variable mention identification in texts can be seen as a multi-label classification problem: Given a sentence in a document, and a list of unique variables (from a reference vocabulary of survey variables), the task is to classify which variables, if any, are mentioned in each sentence. This task is organized by the VAriable Detection, Interlinking, and Summarization (VADIS) project. Further details: https://vadis-project.github.io/sv-ident-sdp2022/

MSLR 2022: Multi-document summarization for medical literature reviews In medicine, systematic literature reviews constitute the highest-quality evidence used to inform clinical care. However, reviews are expensive to produce manually; (semi-)automation via NLP may facilitate faster evidence synthesis without sacrificing rigor. Toward this end, we are running a shared task to study the generation of multi-document summaries in this domain. We make use of two datasets: 1) MS^2: consisting of 20k reviews (citing 470K studies) (https://github.com/allenai/ms2), and 2) Cochrane Conclusions: derived from over 4500 Cochrane reviews ( https://github.com/bwallace/RCT-summarization-data). We also encourage contributions that extend this task and dataset. More information: https://sdproc.org/2022/sharedtasks.html#mslr

Scholarly Knowledge Graph Generation A number of challenging data processing tasks are essential for a scalable creation of a comprehensive scholarly graph, i.e., a graph of entities involving but not limited to research papers, their authors, research organizations, and research themes. This shared task will evaluate three key sub-tasks involved in the generation of a scholarly graph: 1) document deduplication, i.e. identifying and linking different versions of the same scholarly document, 2) extracting research themes, and 3) affiliation mining, i.e., linking research papers or their metadata to the organizational entities that produced them. Test and evaluation data will be supplied by the CORE aggregator (https://core.ac.uk/). Pre-register your team here: https://forms.gle/7nduU6meseEpv9i69. More information: https://sdproc.org/2022/sharedtasks.html#skgg

DAGPap22: Detecting automatically generated scientific papers In this challenge, we explore the state of the art in detecting automatically generated papers. We frame the detection problem as a binary classification task: given an excerpt of text, label it as either human-written or machine-generated. We will provide a corpus of automatically written papers, as well as documents collected by our publishing and editorial teams. As a control, we will provide a corpus of openly accessible human-written papers from the same scientific domains of documents. We also encourage contributions that aim to extend this dataset with other computer-generated scientific papers, or papers that propose valid metrics to assess automatically generated papers against those written by humans.More information at https://sdproc.org/2022/sharedtasks.html#dagpap

** Organizing Committee **

Arman Cohan, Allen Institute for AI, Seattle, USA Guy Feigenblat, Piiano, Israel Dayne Freitag, SRI International, San Diego, USA Tirthankar Ghosal, Charles University, Czech Republic Drahomira Herrmannova, Elsevier, USA Petr Knoth, Open University, UK Kyle Lo, Allen Institute for AI, Seattle, USA Philipp Mayr, GESIS -- Leibniz Institute for the Social Sciences, Germany Robert M. Patton, Oak Ridge National Laboratory, USA Michal Shmueli-Scheuer, IBM Research AI, Haifa Research Lab, Israel Anita de Waard, Elsevier, USA Lucy Lu Wang, Allen Institute for AI, Seattle, USA



Tirthankar Ghosal

Researcher at UFAL, Charles University, CZ


+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 14029 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20220429/10e732b5/attachment.txt>

More information about the Corpora mailing list