Hi Alex, Thanks for your comment. Yes exactly. The equation I gave was the similarity between two words across the whole corpus (global similarity) and my question is how to calculate the similarity of two words in a specific documents (local similarity) given the fact we already know how much each document is related to different topics. I mean I would like an equation mirroring topic proportions of individual documents as well. Having said that, my logic says if we know how probable is for a word to appear throughput the topics and how probable is for the topics to be the subject of a document, we can then calculate local similarity by multiplying topic proportions of a given document into the rows of topic-word table (the table we worked out the global similarity) and the wok out the math again and call it local similarity. Does what I am saying make any sense and is it mathematically correct?  Regards,Yashar

> P(w1|w2)=SUM (z=1 to 10) [P(w1|z)P(z|w2)]

The above looks to be the similarity of 2 words according to the LDA topic models. It may be interesting to compare this with a P(w1|w2) calculated directly from the documents themselves: (# of documents with both w1 and w2)/(# of documents with w2)

-Alex

