[Corpora-List] Document summarization evaluation dataset needed

Ralf Steinberger ralf.steinberger at jrc.ec.europa.eu
Thu Dec 11 13:47:33 CET 2014

Dear Tomáš,

You did not mention the language of the summarisation corpus you are looking for, but at the following web site, you can find manually produced single-document and multi-document summaries in seven languages, together with many more multilingual parallel corpora:


The International Standard Language Resource Number ISLRN for this ‘Multilingual summary evaluation data’ is: 762-292-165-648-8 <http://islrn.org/resources/762-292-165-648-8> .

The data is described in detail in:

Turchi Marco, Josef Steinberger, Mijail Kabadjov & Ralf Steinberger (2010). Using Parallel Corpora for Multilingual (Multi-Document) Summarisation Evaluation. Multilingual and Multimodal Information Access Evaluation. Springer Lecture Notes for Computer Science, LNCS 6360/2010, pp. 52-63

I hope you find this useful.

All the best,


Ralf Steinberger

European Commission – Joint Research Centre (JRC)

URL – Applications: <http://emm.newsbrief.eu/overview.html> http://emm.newsbrief.eu/overview.html

URL – The science behind them: <http://ipsc.jrc.ec.europa.eu/?id=179> http://ipsc.jrc.ec.europa.eu/?id=179

21027 Ispra (VA), Italy

From: corpora-bounces at uib.no [mailto:corpora-bounces at uib.no] On Behalf Of Tomáš Kociský Sent: 10 December 2014 21:25 To: corpora at uib.no Subject: [Corpora-List] Document summarization evaluation dataset needed

Hi All,

Could anyone provide me with pointers to datasets for evaluating (single) document summarization (extractive and/or abstractive) for research purposes? I was unable to obtain the DUC datasets.

Alternatively, if you have any of the DUC datasets please contact me!

Many thanks,

Tomas Kocisky

-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 8536 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20141211/c36db664/attachment.txt>

More information about the Corpora mailing list