[Corpora-List] WSD / # WordNet senses / Mechanical Turk

Andy Schwartz andy.schwartz at gmail.com
Tue Jul 16 23:25:47 CEST 2013

Recently collaborated with a crowdsource researcher to answer a similarquestion among others. We found, over a large sample of words, every additional (coarse-grained) sense resulted in approximately 3% less accuracy of turkers (accuracy is turkers agreement with expert annotations of a SemEval2007 task).

Adam Kapelner, Krishna Kaliannan, H. Andrew Schwartz, Lyle Ungar and Dean Foster. 2012. New Insights from Coarse Word Sense Disambiguation in the Crowd. *In COLing-2012:* pdf <http://www.seas.upenn.edu/%7Ehansens/COLing2012-poster-kapelner.pdf>

a few other findings, from the abstract:

(a) the number of rephrasings within a sense definition is associated with higher accuracy; (b) as word frequency increases, accuracy decreases even if the number of senses is kept constant; and (c) spending more time is associated with a decrease in accuracy.



Sorry if this is a basic question for computational linguists; I'm a corpus
> linguist.
> I'm wondering if there has been much research on inter-rater reliability
> of word sense disambiguation by raters on something like Mechanical Turk.
> For example:
> -- Given some verbs that have 5 word senses each in WordNet (e.g. the
> words tag, tame, taste, temper), how well do native speakers agree on the
> word sense for these verbs in context -- How does this inter-rater
> reliability change for words that might have just two senses (e.g. the
> verbs taint, tamper, tan, tank) or maybe 10 senses (e.g. the verbs shift,
> spread, stop, trim). (In other words, intuition suggests that for words
> with two WordNet senses, there might be higher inter-rater reliability than
> those words with five senses, and that for words with 10 WN senses,
> inter-rate reliability would be pretty bad.) -- Semantically, which kinds
> of 2 / 5 / 10 WN entry words have the best inter-rater reliability, and
> which have the worst?
> Thanks in advance.
> Mark Davies
> ============================================ Mark Davies Professor of
> Linguistics / Brigham Young University http://davies-linguistics.byu.edu/
> ** Corpus design and use // Linguistic databases ** ** Historical
> linguistics // Language variation ** ** English, Spanish, and Portuguese **
> ============================================

-- H. Andrew Schwartz <http://www.seas.upenn.edu/%7Ehansens/> Postdoctoral Fellow Computer & Info. Science / Lead Research Scientist WWBP <http://wwbp.org>, Pos. Psychol. Center University of Pennsylvania 215-746-5085 -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 3619 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20130716/af981e63/attachment.txt>

More information about the Corpora mailing list