[Corpora-List] question as to MI and t score

JFS jfs at di.fct.unl.pt
Thu Dec 15 16:05:01 CET 2005

Ramesh Krishnamurthy wrote:


> Please see http://torvald.aksis.uib.no/corpora/1999-4/0146.html


> If I have understood correctly, the MI score tells you about the

> 'strength of association'

> (but if the corpus frequency figures for either item are very low,

> then you may not have much

> confidence in the association; eg extreme case: X and Y occur only

> once each in the corpus,

> but in that one occurrence, they are adjacent to each other); t-score

> takes into account the

> corpus frequency of the items, so gives you a'confidence rating' in

> the association...


> I suspect that the corpus frequencies for ['play' and 'role] and

> ['fight' and 'battle'] would also have to be

> similar for you to make the claim that they have a similar overall

> collocational relationship...


> Hope this helps

> Ramesh



> At 16:14 14/12/2005, Helene Stengers wrote:


>> Dear list,


>> Imagine you have called up collocation listings for the node word

>> lemmas "play" and "fight". In both lists, the association with for

>> example the collocates "role" and "battle" has the exactly the same

>> MI / t score. Can I assume that both collocations, i.e. "play a role"

>> and "fight a battle" have the same "collocational strength", or is

>> that a wrong assumption?


>> Thanks,

>> Helene


> Ramesh Krishnamurthy

> Lecturer in English Studies

> School of Languages and Social Sciences

> Aston University, Birmingham B4 7ET, UK

> Tel: +44 (0)121-204-3812

> Fax: +44 (0)121-204-3766

> http://www.aston.ac.uk/lss/english/



MI measure is not independent of the bigram frequency. This may be seen
when X and Y occurs in a prefect co-occurence bigram (X occurs only on
left of Y, and Y occurs only on right of X); in these cases MI gives a
higher scores for bigrams of low frequency.

Try scp(X,Y)= f(X,Y)² / (f(X) * f(Y)). It gives the cohesion between X
and Y and it is independent of the bigram frequency.

Or try cosine(X,Y) = f(X,Y)/ sqrt(f(X) * f(Y)). It is also independent
of the bigram frequency.

Both measures gives values from 0 to 1.


Joaquim Ferreira da Silva | Tel: +351 21 294 8536
Professor Auxiliar | +351 21 291 8330 ext: 10732
Departamento de Informática | Fax: +351 21 294 8541
Fac. de Ciências e Tecnologia |jfs at di.fct.unl.pt
Universidade Nova de Lisboa |http://terra.di.fct.unl.pt/~jfs/
2829-516 Caparica, PORTUGAL

More information about the Corpora-archive mailing list