Ramesh Krishnamurthy wrote:
>
> Please see http://torvald.aksis.uib.no/corpora/1999-4/0146.html
>
> If I have understood correctly, the MI score tells you about the
> 'strength of association'
> (but if the corpus frequency figures for either item are very low,
> then you may not have much
> confidence in the association; eg extreme case: X and Y occur only
> once each in the corpus,
> but in that one occurrence, they are adjacent to each other); t-score
> takes into account the
> corpus frequency of the items, so gives you a'confidence rating' in
> the association...
>
> I suspect that the corpus frequencies for ['play' and 'role] and
> ['fight' and 'battle'] would also have to be
> similar for you to make the claim that they have a similar overall
> collocational relationship...
>
> Hope this helps
> Ramesh
>
>
> At 16:14 14/12/2005, Helene Stengers wrote:
>
>> Dear list,
>>
>> Imagine you have called up collocation listings for the node word
>> lemmas "play" and "fight". In both lists, the association with for
>> example the collocates "role" and "battle" has the exactly the same
>> MI / t score. Can I assume that both collocations, i.e. "play a role"
>> and "fight a battle" have the same "collocational strength", or is
>> that a wrong assumption?
>>
>> Thanks,
>> Helene
>
>
Dear
MI measure is not independent of the bigram frequency. This may be seen
when X and Y occurs in a prefect co-occurence bigram (X occurs only on
left of Y, and Y occurs only on right of X); in these cases MI gives a
higher scores for bigrams of low frequency.
Try scp(X,Y)= f(X,Y)² / (f(X) * f(Y)). It gives the cohesion between X
and Y and it is independent of the bigram frequency.
Or try cosine(X,Y) = f(X,Y)/ sqrt(f(X) * f(Y)). It is also independent
of the bigram frequency.
Both measures gives values from 0 to 1.
Joaquim
