I'm looking for a software package that I can use to generate the document similarity matrix for a small corpus of 50 documents, using various of the standard algorithms like tfidf, okapi, language models, cosine, lsa, etc.
Research code is fine I just want a trusted implementation of these
algorithms, languages in order of preference are [Python, C, C++] , [Java],
Perl], and from there it's not really preferred anymore but fine nonetheless
:)
I want to correlate these with human ratings in a research setting.
Thank you very much!
Stephan. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 600 bytes Desc: not available URL: <http://www.uib.no/mailman/public/corpora/attachments/20091015/cfe148d7/attachment.txt>