[Corpora-List] Why is there a log in IDF of TF-IDF?

tbaldwin at gmail.com tbaldwin at gmail.com
Fri Jun 1 06:28:32 CEST 2018


First, there is no canonical TF-IDF formulation, and rather TF-IDF is a family of methods based around a set of intuitions involving TF and DF. But yes, you are correct that one of the standard implementations logs the IDF (incl in BM25), as a means of (monotonically) down-scaling the IDF factor relative to the TF. Otherwise for large document collections, singleton terms absolutely dominate the calculation. There is usually also some additive smoothing of the DF to avoid high DF terms (in all documents) getting a weight of 0.

Tim

On Fri, 2018-06-01 at 10:20 +0800, liling tan wrote:
> Dear All,
>
> Anyone care to answer the question of why is there a log in IDF of TF-IDF?
>
> Regards,
> Liling
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> https://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list