I suggest having a look at MultiLing 2011/2013 dataset which includes news source texts, human and system summaries, evaluation data and available in 10 languages (Arabic, Chinese, Czech, English, French, Greek, Hebrew, Hindi, Romanian and Spanish) [1], [2], [3].
The work was accomplished by the help of different participants to translate, summarise and evaluate the output and it involved many universities around the globe.
Ref:
[1] TAC 2011 MultiLing Pilot Overview
[2] Multi-document multilingual summarization corpus preparation, Part 1: Arabic, English, Greek, Chinese, Romanian
http://aclweb.org/anthology/W/W13/W13-3101.pdf
[3] ACL 2013 MultiLing Workshop
http://www.aclweb.org/anthology/W13-3103
Datasets direct download:
Multiling 2013: https://docs.google.com/uc?id=0B31rakzMfTMZRTZiM29UR3VxYmc <https://docs.google.com/uc?id=0B31rakzMfTMZRTZiM29UR3VxYmc&export=download> &export=download
Best, Mahmoud
--
Dr Mahmoud El-Haj
Senior Research Associate
School of Computing and Communications
Lancaster University
http://www.lancaster.ac.uk/staff/elhaj/
From: corpora-bounces at uib.no [mailto:corpora-bounces at uib.no] On Behalf Of Avinesh PVS Sent: Wednesday, August 12, 2015 9:55 AM To: corpora at uib.no Subject: [Corpora-List] Datasets for Summarization
Dear corpora members,
I am looking for data sets available in summarization. Ideally news and educational domain, but anything would do at the moment.
It would be great if someone could provide pointers.
PS: Data pointers other than TAC & TREC would be highly appreciated.
Thanks & Regards
Avinesh
-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 9254 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20150812/41dda37b/attachment.txt>