[Corpora-List] WSD / # WordNet senses / Mechanical Turk

John F Sowa sowa at bestweb.net
Tue Jul 16 02:59:48 CEST 2013

On 7/15/2013 6:15 PM, Kilian Evang wrote:
> Off the top of my head, here's two relevant studies on inter-rater
> reliability for WSD, one for the case of expert annotators and one for
> the case of non-experts:
> http://link.springer.com/article/10.1023/A:1002693207386#page-1

From the abstract at the pointy end of this pointer:
> The exercise identifies the state-of-the-art for fine-grained word sense
> disambiguation, where training data is available, as 74–78% correct, with
> a number of algorithms approaching this level of performance. For systems
> that did not assume the availability of training data, performance was
> markedly lower and also more variable. Human inter-tagger agreement was
> high, with the gold standard taggings being around 95% replicable.

Implication: For a 300-word page of text, a state-of-the-art program would have about 75 errors. That would be an average of two errors for 8-word sentences, or five errors for 20-word sentences.

For the "gold" standard, there would still be 15 errors in a 300-word page. Miss Elliott, my high-school English teacher, wouldn't give anyone a gold star for 15 errors per page.


More information about the Corpora mailing list