[Corpora-List] Why is there a log in IDF of TF-IDF?
tbaldwin at gmail.com
tbaldwin at gmail.com
Fri Jun 1 06:28:32 CEST 2018
First, there is no canonical TF-IDF formulation, and rather TF-IDF is a family
of methods based around a set of intuitions involving TF and DF. But yes, you
are correct that one of the standard implementations logs the IDF (incl in
BM25), as a means of (monotonically) down-scaling the IDF factor relative to the
TF. Otherwise for large document collections, singleton terms absolutely
dominate the calculation. There is usually also some additive smoothing of the
DF to avoid high DF terms (in all documents) getting a weight of 0.
Tim
On Fri, 2018-06-01 at 10:20 +0800, liling tan wrote:
> Dear All,
>
> Anyone care to answer the question of why is there a log in IDF of TF-IDF?
>
> Regards,
> Liling
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> https://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list