-K
2018-06-01 6:28 GMT+02:00 <tbaldwin at gmail.com>:
> First, there is no canonical TF-IDF formulation, and rather TF-IDF is a
> family
> of methods based around a set of intuitions involving TF and DF. But yes,
> you
> are correct that one of the standard implementations logs the IDF (incl in
> BM25), as a means of (monotonically) down-scaling the IDF factor relative
> to the
> TF. Otherwise for large document collections, singleton terms absolutely
> dominate the calculation. There is usually also some additive smoothing of
> the
> DF to avoid high DF terms (in all documents) getting a weight of 0.
>
>
> Tim
>
> On Fri, 2018-06-01 at 10:20 +0800, liling tan wrote:
> > Dear All,
> >
> > Anyone care to answer the question of why is there a log in IDF of
> TF-IDF?
> >
> > Regards,
> > Liling
> > _______________________________________________
> > UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> > Corpora mailing list
> > Corpora at uib.no
> > https://mailman.uib.no/listinfo/corpora
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> https://mailman.uib.no/listinfo/corpora
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/html
Size: 2195 bytes
Desc: not available
URL: <https://mailman.uib.no/public/corpora/attachments/20180601/99174117/attachment.txt>