# [Corpora-List] Local Similarity in LDA

Yashar Najafloo yasharnajafloo at yahoo.com
Fri May 20 05:36:39 CEST 2016

Hi Waseem, Thanks for your reply. The equation you sent is great as there is no conditional probability between D and w which is what I wanted. However the question is:1. What if there is only 1 document. (meaning D1=D2)2. What if there are more than 2 documents?Your answer was exactly what I was after though. I just need the above questions clarified and I am the happiest person in the world ;) Regards,Yashar Najaflou

On Friday, 20 May 2016 2:58 PM, Waseem Helmi Gharbieh <waseem.gharbieh at unb.ca> wrote:

#yiv5308969552 #yiv5308969552 -- P {margin-top:0;margin-bottom:0;}#yiv5308969552 Hi Yashar,                From my understanding, you can find P(w1|w2) by decomposing it into: P(w1|w2) = P(w1|D1)P(D1|D2)P(D2|w2)You can compute this probability by decomposing each of the three probabilities so you get (assuming 10 topics): P(w1|w2) = (SUM(1 to 10) [P(w1|z) P(z|D1)]) (SUM(1 to 10) [P(D1|z) P(z|D2)]) (SUM(1 to 10) [P(D2|z) P(z|w2)]) Is this what you wanted?Waseem Gharbieh From: corpora-bounces at uib.no <corpora-bounces at uib.no> on behalf of Yashar Najafloo <yasharnajafloo at yahoo.com> Sent: 19 May 2016 18:24:40 To: Alexander Yeh; CORPORA at UIB.NO Subject: Re: [Corpora-List] Local Similarity in LDA Hi Alex, Thanks for your comment. Yes exactly. The equation I gave was the similarity between two words across the whole corpus (global similarity) and my question is how to calculate the similarity of two words in a specific documents (local similarity) given the fact we already know how much each document is related to different topics. I mean I would like an equation mirroring topic proportions of individual documents as well. Having said that, my logic says if we know how probable is for a word to appear throughput the topics and how probable is for the topics to be the subject of a document, we can then calculate local similarity by multiplying topic proportions of a given document into the rows of topic-word table (the table we worked out the global similarity) and the wok out the math again and call it local similarity. Does what I am saying make any sense and is it mathematically correct?  Regards,Yashar

On Thursday, 19 May 2016 7:34 PM, Alexander Yeh <asy at mitre.org> wrote:

Yashar Najafloo wrote:
> Hi there,
>
> I have a question with regards to similarity between two words in LDA
> (Latent Dirichlet Allocation) and was wondering if anyone can kindly
> help me out.
> I'll try to keep it short.
>
> I have a corpus and analysed it using LDA and Variational Inference. I
> now know how much documents are about different topics and how much each
> topic is about different words in my word list. I know the similarity
> between two words can be calculated by the amount of topic two share
> which is sum of (say 10 topics) conditional probability of word one
> given topic z multiplied in conditional probability of topic z given
> word two.
> P(w1|w2)=SUM (z=1 to 10) [P(w1|z)P(z|w2)]

The above looks to be the similarity of 2 words according to the LDA topic models. It may be interesting to compare this with a P(w1|w2) calculated directly from the documents themselves: (# of documents with both w1 and w2)/(# of documents with w2)

-Alex

>
> The question is how to calculate the similarity of two words in
> particular documents (we know how much the documents are about topics).
> I was thinking of taking the topic proportion of documents as weights,
> multiply in the topics given their weights and work out the above
> mentioned math. Is what I am trying to achieve mathematically correct?
>
> Regards,
> Yashar
>
>
> _______________________________________________