Koos Wilt kooswilt at gmail.com
Fri Jun 1 11:35:57 CEST 2018

Again, the Great Koos brain, a fantastic contraption, holding people around the globe enthralled in awe, strikes again: It stretches, yawns, seems to be off to a slow start, it belches, and then, unexpectedly, spews forth a fraction of its amazing knowledge.

Send money.

Seriously, reading this kind of stuff will deepen you understanding of what is really going on. NLP formulae are full of equivalencies, the best known push-down atomata and context-free languages. But also Kullback-Leibler and Multinomial Bayes have been suggested to be the same.

-K

2018-06-01 10:39 GMT+02:00 Koos Wilt <kooswilt at gmail.com>:

> Will look it up. Thanks.
>
> -K
>
> 2018-06-01 10:38 GMT+02:00 Bob Luk <csrluk at gmail.com>:
>
>> Are you sure it is a cross entropy? You need to sum for all x in
>> CrossEntropy(x) = SUM p(x) log q(x). For all x would mean for all words in
>> the documents not for all words in the query since the tf is the tf in the
>> document.
>>
>> Cheers,
>>
>> Robert Luk
>>
>> On Fri, Jun 1, 2018 at 4:25 PM, Koos Wilt <kooswilt at gmail.com> wrote:
>>
>>> Just FYI and making conversation: did you guys know tf*idf is equivalent
>>> to Shannon's cross-entropy?
>>>
>>> -K
>>>
>>> 2018-06-01 6:28 GMT+02:00 <tbaldwin at gmail.com>:
>>>
>>>> First, there is no canonical TF-IDF formulation, and rather TF-IDF is a
>>>> family
>>>> of methods based around a set of intuitions involving TF and DF. But
>>>> yes, you
>>>> are correct that one of the standard implementations logs the IDF (incl
>>>> in
>>>> BM25), as a means of (monotonically) down-scaling the IDF factor
>>>> relative to the
>>>> TF. Otherwise for large document collections, singleton terms absolutely
>>>> dominate the calculation. There is usually also some additive smoothing
>>>> of the
>>>> DF to avoid high DF terms (in all documents) getting a weight of 0.
>>>>
>>>>
>>>> Tim
>>>>
>>>> On Fri, 2018-06-01 at 10:20 +0800, liling tan wrote:
>>>> > Dear All,
>>>> >
>>>> > Anyone care to answer the question of why is there a log in IDF of
>>>> TF-IDF?
>>>> >
>>>> > Regards,
>>>> > Liling
>>>> > _______________________________________________
>>>> > Corpora mailing list
>>>> > Corpora at uib.no
>>>> > https://mailman.uib.no/listinfo/corpora
>>>>
>>>> _______________________________________________
>>>> Corpora mailing list
>>>> Corpora at uib.no
>>>> https://mailman.uib.no/listinfo/corpora
>>>>
>>>
>>>
>>> _______________________________________________