http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.32.1716
Not sure we still have the data but it shouldn't be too difficult to recreate (feel free to contact me offline)
HTH, Tony -- ------------------------------- Tony Russell-Rose PhD FBCS CITP Vice-chair, BCS IRSG Chair, IEHF HCI Group http://uxlabs.co.uk http://isquared.wordpress.com
On 04/03/2014 15:48, Ivelina Nikolova wrote:
> Dear corpora members,
>
> I am looking for a gold standard to train/evaluate document similarity
> metrics.
> Can anyone suggest a suitable corpus for such purposes. I'm especially
> interested in similarity between newspaper articles.
>
> Thanks in advance,
> Ivelina
>
-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 1811 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20140305/55670532/attachment.txt>