[Corpora-List] WSD / # WordNet senses / Mechanical Turk

Adam Kilgarriff adam at lexmasterclass.com
Tue Jul 16 08:43:58 CEST 2013


Dear Mark, John,

Let me confess to a moment of embarrassment that I've been anxious about for years: following SENSEVAL-1 I did a (tiny) experiment to establish inter-annotator agreement, and came up with the 95% figure cited by John.

On experience since, I think the findings were not sound, and it is most unusual to get a figure that high, and I regret having published it (and, worse, having put it in the title of a short paper from EACL-99)

For either automatic WSD, or even for the gold standard, I agree entirely with John:

Miss Elliott, my high-school English teacher, wouldn't give
> anyone a gold star [for work like that]

Adam

On 16 July 2013 01:59, John F Sowa <sowa at bestweb.net> wrote:


> On 7/15/2013 6:15 PM, Kilian Evang wrote:
>
>> Off the top of my head, here's two relevant studies on inter-rater
>> reliability for WSD, one for the case of expert annotators and one for
>> the case of non-experts:
>>
>> http://link.springer.com/**article/10.1023/A:**1002693207386#page-1<http://link.springer.com/article/10.1023/A:1002693207386#page-1>
>>
>
> From the abstract at the pointy end of this pointer:
>
>> The exercise identifies the state-of-the-art for fine-grained word sense
>> disambiguation, where training data is available, as 74–78% correct, with
>> a number of algorithms approaching this level of performance. For systems
>> that did not assume the availability of training data, performance was
>> markedly lower and also more variable. Human inter-tagger agreement was
>> high, with the gold standard taggings being around 95% replicable.
>>
>
> Implication: For a 300-word page of text, a state-of-the-art program
> would have about 75 errors. That would be an average of two errors
> for 8-word sentences, or five errors for 20-word sentences.
>
> For the "gold" standard, there would still be 15 errors in a 300-word
> page. Miss Elliott, my high-school English teacher, wouldn't give
> anyone a gold star for 15 errors per page.
>
> John
>
>
> ______________________________**_________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/**corpora<http://mailman.uib.no/options/corpora>
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/**listinfo/corpora<http://mailman.uib.no/listinfo/corpora>
>

-- ======================================== Adam Kilgarriff <http://www.kilgarriff.co.uk/> adam at lexmasterclass.com Director Lexical Computing Ltd<http://www.sketchengine.co.uk/>

Visiting Research Fellow University of Leeds<http://leeds.ac.uk>

*Corpora for all* with the Sketch Engine <http://www.sketchengine.co.uk>

*DANTE: a lexical database for English<http://www.webdante.com>

* ======================================== -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 4357 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20130716/10a163a4/attachment.txt>



More information about the Corpora mailing list