[Corpora-List] Why is there a log in IDF of TF-IDF?

Koos Wilt kooswilt at gmail.com
Fri Jun 1 10:25:29 CEST 2018


Just FYI and making conversation: did you guys know tf*idf is equivalent to Shannon's cross-entropy?

-K

2018-06-01 6:28 GMT+02:00 <tbaldwin at gmail.com>:


> First, there is no canonical TF-IDF formulation, and rather TF-IDF is a
> family
> of methods based around a set of intuitions involving TF and DF. But yes,
> you
> are correct that one of the standard implementations logs the IDF (incl in
> BM25), as a means of (monotonically) down-scaling the IDF factor relative
> to the
> TF. Otherwise for large document collections, singleton terms absolutely
> dominate the calculation. There is usually also some additive smoothing of
> the
> DF to avoid high DF terms (in all documents) getting a weight of 0.
>
>
> Tim
>
> On Fri, 2018-06-01 at 10:20 +0800, liling tan wrote:
> > Dear All,
> >
> > Anyone care to answer the question of why is there a log in IDF of
> TF-IDF?
> >
> > Regards,
> > Liling
> > _______________________________________________
> > UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> > Corpora mailing list
> > Corpora at uib.no
> > https://mailman.uib.no/listinfo/corpora
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> https://mailman.uib.no/listinfo/corpora
>
-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 2195 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20180601/99174117/attachment.txt>



More information about the Corpora mailing list