[Corpora-List] Short Text Corpus

Torsten Zesch zesch at tk.informatik.tu-darmstadt.de
Wed May 26 14:32:55 CEST 2010


Dear Khaled,


> I'm looking for a corpus of short text (e.g. sentences) pairs for
> measuring similarity purpose. Could anyone please suggest me a link of such
> resource.

Here are some pointers to datasets that have been used for that purpose before:

a) Microsoft Paraphrase Corpus http://research.microsoft.com/en-us/downloads/607d14d9-20cd-47e3-85bc-a2f65cd28042/

b) Li, Y., McLean, D., Bandar, Z., O'Shea, J., and Crockett, K. (2006). Sentence similarity based on semantic nets and corpus statistics. IEEE Transactions on Knowledge and Data Engineering, 18(8):1138-1150.

http://www2.docm.mmu.ac.uk/STAFF/J.Oshea/TRMMUCCA20081_5.pdf

c) Lee, M. D., Pincombe, B., and Welsh, M. (2005). An empirical evaluation of models of text document similarity. In Proceedings of the 27th Annual Conference of the Cognitive Science Society, pages 1254-1259.

Available upon request, as far as I know.

-Torsten


> -----Ursprüngliche Nachricht-----
> Von: corpora-bounces at uib.no [mailto:corpora-bounces at uib.no] Im Auftrag von
> KHALED OMAR
> Gesendet: Donnerstag, 20. Mai 2010 10:17
> An: corpora at uib.no
> Betreff: [Corpora-List] Short Text Corpus
>
> Dear all,
>
> I'm looking for a corpus of short text (e.g. sentences) pairs for
> measuring similarity purpose. Could anyone please suggest me a link of such
> resource.
>
>
>
> Thank you so much in advance.
>
>
>
> Khaled
>
>
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list