[Corpora-List] Algorithm for hypernym chaining?

Ken Litkowski ken at clres.com
Fri Jun 25 16:50:08 CEST 2010


I think that graph algorithms (search for "algorithms for directed graphs") should do the trick efficiently. The key to your problem is that each of your entries constitutes an "edge" between "vertices". You compile a list of such edges and the algorithms then go through a depth-first search to construct the full digraph. Although these algorithms are NP-complete, the number of vertices (e.g., in WordNet) is sufficiently small that your task can be accomplished in at most a matter of a few minutes. WordNet has the added advantage that there are no cycles. (I have C++ code I'm willing to share, but it might not be the most efficient for your purposes.)

Adam Funk wrote:
> Sorry if this is a bit off-topic, but at least it's for an NLP
> application. I have a file of pairs of WordNet synset numbers that
> represent hypernym-hyponym pairs, for example:
>
> 100001930,100001740
> 100002137,100001740
> 100002452,100001930
> 100002684,100001930
> 100003553,100002684
> 100003993,100003553
> 100004258,100003553
> 100004475,100004258
> 100005787,100004475
> 100005930,100004475
>
> and I want to produce all the longest possible chains of them. The
> output for that list is as follows:
>
> 100002137,100001740
> 100002452,100001930,100001740
> 100003993,100003553,100002684,100001930,100001740
> 100005787,100004475,100004258,100003553,100002684,100001930,100001740
> 100005930,100004475,100004258,100003553,100002684,100001930,100001740
>
> I have a program that does this correctly for small samples of the data
> I'm supposed to process, but I've estimated (using a log plot) that it
> will take about 10^595 seconds to run over the whole list (about 89000
> pairs), so I think I'm doing it wrong.
>
> I haven't been able to find an algorithm for this, because I don't know
> what the computational problem is called. Can anyone point me in the
> direction of something suitable?
>
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>
>

-- Ken Litkowski TEL.: 301-482-0237 CL Research EMAIL: ken at clres.com 9208 Gue Road Home Page: http://www.clres.com Damascus, MD 20872-1025 USA Blog: http://www.clres.com/blog



More information about the Corpora mailing list