[Corpora-List] software semantic similarity between texts

Dom Widdows widdows at google.com
Mon Oct 20 16:43:26 CEST 2008


Hi Antonio,

An option I'd like to add to Scott's list is the Semantic Vectors package (http://semanticvectors.googlecode.com). The package is quite stable, and judging by the reasonably frequent feedback and questions we get on the mailing list, users have found it pretty easy to get started with.

Semantic Vectors uses random projection which scales much better than some of the other matrix factorization techniques used in latent semantic analysis - there is some evidence that for small corpora, the more traditional singular value decomposition gives more accurate results, though I think there is much yet to be learned in this area. It should also be relatively easy to add singular value decomposition as an option, though I haven't done this yet - if you want to use SVD, you could also try the older Infomap package at http://infomap-nlp.sourceforge.net/.

Best wishes, Dominic

On Mon, Oct 20, 2008 at 10:00 AM, Scott A. Crossley <sacrossley at gmail.com> wrote:
> Latent Semantic Analysis should do the trick. There are a variety of tools
> on the website that should help you out.
>
> http://lsa.colorado.edu/
>
> Scott Crossley, Ph.D.
> Linguistics/TESOL
>
> Department of English
> Mississippi State University
> http://www.msstate.edu/dept/english/tesol/tesolfaculty.html
> (662) 325-2355
>
> Institute for Intelligent Systems
> University of Memphis
> http://mnemosyne.csl.psyc.memphis.edu/iis/
>
>
> -----Original Message-----
> From: corpora-bounces at uib.no [mailto:corpora-bounces at uib.no] On Behalf Of
> Antonio Toral
> Sent: Monday, October 20, 2008 7:51 AM
> To: corpora at uib.no
> Subject: [Corpora-List] software semantic similarity between texts
>
> Dear Corpora members,
>
> I'm looking for some software that computes semantic similarity between
> small
> texts (e.g. wordnet glosses, dictionary definitions). I am aware of
> simFinder
> but it seems that is not available anymore. Does anyone know about any
> available software to do this?
>
> Thanks!
>
> Regards,
> --
> Antonio Toral
>
> Istituto di Linguistica Computazionale
> Consiglio Nazionale delle Ricerche
> Area della Ricerca di Pisa
>
> http://www.dlsi.ua.es/~atoral/
>
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>
>
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>



More information about the Corpora mailing list