[Corpora-List] question as to MI and t score

Stefan Evert stefan.evert at uos.de
Thu Dec 15 18:04:00 CET 2005

Dear Helene,

I suppose your rationale is that since MI and t-score measure two
different aspects of collocations (MI not being sensitive to absolute
frequency per se, while t-score is very sensitive in this respect), if
both values are the same for "play - role" and "fight - battle", the
"collocational strength" should be the same in all respects. Is this
interpretation correct?

However, if both scores are the same for the two collocations, this
means simply that both the observed frequencies and the expected
frequencies of "play - role" and "fight - battle" are identical (you can
work this out relatively easily from equations, e.g. those given on
www.collocations.de/AM). While this doesn't indicate a difference in the
degree of collocation, of course, it no more "proves" that the
collocational strength is really identical than observing the same
frequency for a phenomenon in two different corpora proves anything
about that phenomenon in general – the observation may just as well be
due to the vagaries of sampling, especially when the frequencies are
very low.

What you can do is to rule out a large difference between the
collocational strengths of "play a role" and "fight a battle" with a
certain degree of statistical confidence. Working out exactly what upper
bounds on this difference one can assume with how much confidence is
almost as difficult as a mathematical problem as interpreting the
differences is as a linguistic problem (what does it really mean if the
difference in collocational strength is at most "1.7"??).

Best regards,



> Imagine you have called up collocation listings for the node word

> lemmas "play" and "fight". In both lists, the association with for

> example the collocates "role" and "battle" has the exactly the same MI

> / t score. Can I assume that both collocations, i.e. "play a role" and

> "fight a battle" have the same "collocational strength", or is that a

> wrong assumption?


> Thanks,

> Helene

More information about the Corpora-archive mailing list