[Corpora-List] Local Similarity in LDA

Alexander Yeh asy at mitre.org
Thu May 19 09:34:03 CEST 2016


Yashar Najafloo wrote:
> Hi there,
>
> I have a question with regards to similarity between two words in LDA
> (Latent Dirichlet Allocation) and was wondering if anyone can kindly
> help me out.
> I'll try to keep it short.
>
> I have a corpus and analysed it using LDA and Variational Inference. I
> now know how much documents are about different topics and how much each
> topic is about different words in my word list. I know the similarity
> between two words can be calculated by the amount of topic two share
> which is sum of (say 10 topics) conditional probability of word one
> given topic z multiplied in conditional probability of topic z given
> word two.
> P(w1|w2)=SUM (z=1 to 10) [P(w1|z)P(z|w2)]

The above looks to be the similarity of 2 words according to the LDA topic models. It may be interesting to compare this with a P(w1|w2) calculated directly from the documents themselves: (# of documents with both w1 and w2)/(# of documents with w2)

-Alex


>
> The question is how to calculate the similarity of two words in
> particular documents (we know how much the documents are about topics).
> I was thinking of taking the topic proportion of documents as weights,
> multiply in the topics given their weights and work out the above
> mentioned math. Is what I am trying to achieve mathematically correct?
>
> Regards,
> Yashar
>
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>



More information about the Corpora mailing list