[Corpora-List] Log-likelihood (was : Re: Questions about t-score)

Stefan Evert stefan.evert at uos.de
Thu Apr 30 00:21:41 CEST 2009



>

Hi again!


> I read both this documents with the greatest interest, since I've been
> intensively using association measures.
> I have a question regarding log-likelihood computed from contingency
> table. In some case, I obtain nil values for O_12 or O_21 values
> (following your notations). Therefore, the log-likelihood is
> undefined,
> because log(O_12/E_12) (or log(O_21/E_21)) is undefined.

If any of the observed frequencies is zero, you simply drop the corresponding term from the log-likelihood summation. The mathematical rationale is that in this case

O_ij * log (O_ij / E_ij) = 0 * log 0 = 0

by continuous extension, because lim[x -> 0] x * log x = 0.


> How to handle such situation to keep a balanced, homogenous score.
> Most
> of the time, nil values are simply ignored (log(O_12/E_12) is simply
> replaced by 0), but I feel the log-likelihood computed that way can
> not
> be correctly interpreted anymore.

No, it's mathematically correct to ignore these terms, and log- likelihood scores can still be interpreted in the normal way.

BTW, most association measures handle contingency tables with zeroes (usually O_12 or O_21, but possibly also O_11) if they're properly implemented (taking care of all special cases); but they will often break down if the _expected_ frequencies become zero (i.e. for degenerated contingency tables where an entire row or column is zero).

Hope this helps, Stefan



More information about the Corpora mailing list