[Corpora-List] Datasets for Summarization

Mahmoud EL-Haj dr.melhaj at gmail.com
Wed Aug 12 15:07:50 CEST 2015


Dear Avinesh,

I suggest having a look at MultiLing 2011/2013 dataset which includes news source texts, human and system summaries, evaluation data and available in 10 languages (Arabic, Chinese, Czech, English, French, Greek, Hebrew, Hindi, Romanian and Spanish) [1], [2], [3].

The work was accomplished by the help of different participants to translate, summarise and evaluate the output and it involved many universities around the globe.

Ref:

[1] TAC 2011 MultiLing Pilot Overview

http://www.nist.gov/tac/publications/2011/additional.papers/Summarization2011_MultiLing_overview.proceedings.pdf

[2] Multi-document multilingual summarization corpus preparation, Part 1: Arabic, English, Greek, Chinese, Romanian

http://aclweb.org/anthology/W/W13/W13-3101.pdf

[3] ACL 2013 MultiLing Workshop

http://www.aclweb.org/anthology/W13-3103

Datasets direct download:

Multiling 2011: http://multiling.iit.demokritos.gr/file/view/353/tac-2011-multiling-pilot-dataset-all-files-source-texts-human-and-system-summaries-evaluation-data

Multiling 2013: https://docs.google.com/uc?id=0B31rakzMfTMZRTZiM29UR3VxYmc <https://docs.google.com/uc?id=0B31rakzMfTMZRTZiM29UR3VxYmc&export=download> &export=download

Best, Mahmoud

--

Dr Mahmoud El-Haj

Senior Research Associate

School of Computing and Communications

Lancaster University

http://www.lancaster.ac.uk/staff/elhaj/

m.el-haj at lancaster.ac.uk

From: corpora-bounces at uib.no [mailto:corpora-bounces at uib.no] On Behalf Of Avinesh PVS Sent: Wednesday, August 12, 2015 9:55 AM To: corpora at uib.no Subject: [Corpora-List] Datasets for Summarization

Dear corpora members,

I am looking for data sets available in summarization. Ideally news and educational domain, but anything would do at the moment.

It would be great if someone could provide pointers.

PS: Data pointers other than TAC & TREC would be highly appreciated.

Thanks & Regards

Avinesh

-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 9254 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20150812/41dda37b/attachment.txt>



More information about the Corpora mailing list