On Thursday, 19 May 2016 7:34 PM, Alexander Yeh <asy at mitre.org> wrote:
Yashar Najafloo wrote:
> Hi there,
> I have a question with regards to similarity between two words in LDA
> (Latent Dirichlet Allocation) and was wondering if anyone can kindly
> help me out.
> I'll try to keep it short.
> I have a corpus and analysed it using LDA and Variational Inference. I
> now know how much documents are about different topics and how much each
> topic is about different words in my word list. I know the similarity
> between two words can be calculated by the amount of topic two share
> which is sum of (say 10 topics) conditional probability of word one
> given topic z multiplied in conditional probability of topic z given
> word two.
> P(w1|w2)=SUM (z=1 to 10) [P(w1|z)P(z|w2)]
The above looks to be the similarity of 2 words according to the LDA topic models. It may be interesting to compare this with a P(w1|w2) calculated directly from the documents themselves: (# of documents with both w1 and w2)/(# of documents with w2)
> The question is how to calculate the similarity of two words in
> particular documents (we know how much the documents are about topics).
> I was thinking of taking the topic proportion of documents as weights,
> multiply in the topics given their weights and work out the above
> mentioned math. Is what I am trying to achieve mathematically correct?
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 4868 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20160519/f8dd55d5/attachment.txt>