http://searchivarius.org/blog/tf-idf-simply-cross-entropy
Send money.
Seriously, reading this kind of stuff will deepen you understanding of what is really going on. NLP formulae are full of equivalencies, the best known push-down atomata and context-free languages. But also Kullback-Leibler and Multinomial Bayes have been suggested to be the same.
-K
2018-06-01 10:39 GMT+02:00 Koos Wilt <kooswilt at gmail.com>:
> Will look it up. Thanks.
>
> -K
>
> 2018-06-01 10:38 GMT+02:00 Bob Luk <csrluk at gmail.com>:
>
>> Are you sure it is a cross entropy? You need to sum for all x in
>> CrossEntropy(x) = SUM p(x) log q(x). For all x would mean for all words in
>> the documents not for all words in the query since the tf is the tf in the
>> document.
>>
>> Cheers,
>>
>> Robert Luk
>>
>> On Fri, Jun 1, 2018 at 4:25 PM, Koos Wilt <kooswilt at gmail.com> wrote:
>>
>>> Just FYI and making conversation: did you guys know tf*idf is equivalent
>>> to Shannon's cross-entropy?
>>>
>>> -K
>>>
>>> 2018-06-01 6:28 GMT+02:00 <tbaldwin at gmail.com>:
>>>
>>>> First, there is no canonical TF-IDF formulation, and rather TF-IDF is a
>>>> family
>>>> of methods based around a set of intuitions involving TF and DF. But
>>>> yes, you
>>>> are correct that one of the standard implementations logs the IDF (incl
>>>> in
>>>> BM25), as a means of (monotonically) down-scaling the IDF factor
>>>> relative to the
>>>> TF. Otherwise for large document collections, singleton terms absolutely
>>>> dominate the calculation. There is usually also some additive smoothing
>>>> of the
>>>> DF to avoid high DF terms (in all documents) getting a weight of 0.
>>>>
>>>>
>>>> Tim
>>>>
>>>> On Fri, 2018-06-01 at 10:20 +0800, liling tan wrote:
>>>> > Dear All,
>>>> >
>>>> > Anyone care to answer the question of why is there a log in IDF of
>>>> TF-IDF?
>>>> >
>>>> > Regards,
>>>> > Liling
>>>> > _______________________________________________
>>>> > UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
>>>> > Corpora mailing list
>>>> > Corpora at uib.no
>>>> > https://mailman.uib.no/listinfo/corpora
>>>>
>>>> _______________________________________________
>>>> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
>>>> Corpora mailing list
>>>> Corpora at uib.no
>>>> https://mailman.uib.no/listinfo/corpora
>>>>
>>>
>>>
>>> _______________________________________________
>>> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
>>> Corpora mailing list
>>> Corpora at uib.no
>>> https://mailman.uib.no/listinfo/corpora
>>>
>>>
>>
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/html
Size: 5138 bytes
Desc: not available
URL: <https://mailman.uib.no/public/corpora/attachments/20180601/e500fe32/attachment.txt>