[Corpora-List] Why is there a log in IDF of TF-IDF?

liling tan alvations at gmail.com
Fri Jun 1 04:27:07 CEST 2018


Dear All,

Anyone care to explain the question of why is there a log in IDF of TF-IDF?


>From Robertson 2004, it seems that log "is not in general important"
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.97.7340&rep=rep1&type=pdf


>From a computation perspective, we do impose a cap the upperbound of the
numerator N (total no. of documents) which would help in preventing overflow if we start multiplying IDF values. And taking log would help to change the product to addition.

Are there other reasons to take the log?

Regards, Liling -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 3344 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20180601/39d1f0da/attachment.txt>



More information about the Corpora mailing list