Again, the Great Koos brain, a fantastic contraption, holding people around the globe enthralled in awe, strikes again: It stretches, yawns, seems to be off to a slow start, it belches, and then, unexpectedly, spews forth a fraction of its amazing knowledge.

Send money.

Seriously, reading this kind of stuff will deepen you understanding of what is really going on. NLP formulae are full of equivalencies, the best known push-down atomata and context-free languages. But also Kullback-Leibler and Multinomial Bayes have been suggested to be the same.

>> Are you sure it is a cross entropy? You need to sum for all x in
>> CrossEntropy(x) = SUM p(x) log q(x). For all x would mean for all words in
>> the documents not for all words in the query since the tf is the tf in the
>> document.
Just FYI and making conversation: did you guys know tf*idf is equivalent
to Shannon's cross-entropy?
>>> to Shannon's cross-entropy?
>>>> First, there is no canonical TF-IDF formulation, and rather TF-IDF is a
>>>> family
>>>> of methods based around a set of intuitions involving TF and DF. But
>>>> yes, you
>>>> are correct that one of the standard implementations logs the IDF (incl
>>>> in
>>>> BM25), as a means of (monotonically) down-scaling the IDF factor
>>>> relative to the
>>>> TF. Otherwise for large document collections, singleton terms absolutely
>>>> dominate the calculation. There is usually also some additive smoothing
>>>> of the
>>>> DF to avoid high DF terms (in all documents) getting a weight of 0.
>>>> > Dear All,
>>>> >
>>>> > Anyone care to answer the question of why is there a log in IDF of
>>>> TF-IDF?
>>>> >
>>>> > Regards,
>>>> > Liling
