[Corpora-List] mutual similarity

Adam Kilgarriff adam at lexmasterclass.com
Wed Jun 28 07:36:01 CEST 2006


This area has blossomed in recent years and there is ample work on the

Greg Grefenstette explored it in detail in his thesis and associated book
<http://portal.acm.org/citation.cfm?coll=GUIDE&dl=GUIDE&id=527911> in
Automatic Thesaurus Discovery, Kluwer, 1994). Dekang Lin introduced a new
measure which has been adopted by quite a few people (including myself) in
his COLING 1998 paper. Lillian Lee compared various measures in her thesis,
see her papers in Proc ACL 1999. Since 2003, two excellent theses on the
question are by Julie Weeds (Sussex Univ) and James Curran (Edinburgh Univ).
Both of them are authors and co-authors on various papers further exploring
the topic - see e.g., Weeds and Weir in the latest CL, 31 (4) 2005. Geffet
and Dagan (COLING 2004) is another thought-provoking paper. In ACL-COLING
2006, Gorman and Curran move on to the next question: what are the
computational issues about producing thesauruses from very large (billion+
word) corpora.



-----Original Message-----
From: owner-corpora at lists.uib.no [mailto:owner-corpora at lists.uib.no] On
Behalf Of Stefano Vegnaduzzo
Sent: 28 June 2006 05:28
Subject: [Corpora-List] mutual similarity

Dear all,

I would like to ask for pointers/literature/references/etc on the topic of
mutual (or reciprocal) similarity. Here is what I mean by this:

Given a term t0 and a set of terms t1 ... tn, a similarity measure M
typically allows you to rank the terms t1 & tn according to their similarity
to t0.

My question: Given a term t0 and a set of terms t1 ... tn, and a similarity
measure M, and assuming a non-symmetric similarity relation (i.e., M(t1,t2)
is different from M(t2,t1), how do you compute the mutual similarity MS of
t0 with respect to each term t1 ... tn, where M(t0,ti) is different from
M(ti,t0). In other words, I am interested in computing and ranking the
mutual similarity of all pairs MS(t0,ti), where MS(t0,ti) is some function
of M(t0,ti) and M(ti,t0).

Cases of interest are for example those where M(t0,tX) is a bit higher than
M(t0,tY) but M(tY,t0) is much higher than M(tX,t0), so I would like a mutual
similarity measure to capture this by assigning MS(t0,ty) a higher score
than MS(t0,tx)

I found very limited references in the literature. For example D. Hindle.
Noun classification from predicate-argument structures (1990) defines
reciprocal similarity as the case where two terms are each other's most
similar term, but this is way too restrictive for what I am interested in.

Any help will be appreciated,

Stefano Vegnaduzzo

-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://mailman.uib.no/public/corpora-archive/attachments/20060628/dabfc7e3/attachment.html

More information about the Corpora-archive mailing list