[Corpora-List] WSD / # WordNet senses / Mechanical Turk

Karen Fort karen.fort at loria.fr
Tue Jul 16 15:36:13 CEST 2013


Hi all,

May I ponder this (from Fort et al, 2011: http://www.mitpressjournals.org/doi/pdf/10.1162/COLI_a_00057): "in (Bhardwaj et al. 2010),[...] it is shown that, for their task of word sense disambiguation, a small number of trained annotators are superior to a larger number of untrained Turkers. On that point, their results contradict that of (Snow et al. 2008), whose task was much simpler (the number of senses per word was 3 for the latter, versus 9.5 for the former)"

Bhardwaj, Vikas, Rebecca Passonneau, Ansaf Salleb-Aouissi, and Nancy Ide. 2010. Anveshan: A tool for analysis of multiple annotators’ labeling behavior. In Proceedings of The Fourth Linguistic Annotation Workshop (LAW IV) pages 47–55, Uppsala.

Karën Fort ATER ENSMN Loria, équipe Sémagramme Bureau C303 +33 (0)3 54 95 86 54 http://www.loria.fr/~fortkare/

----- Mail original -----
> De: "Benjamin Van Durme" <vandurme at cs.jhu.edu>
> À: corpora at uib.no
> Envoyé: Mardi 16 Juillet 2013 14:32:38
> Objet: Re: [Corpora-List] WSD / # WordNet senses / Mechanical Turk
>
> Rion Snow, Brendan O'Connor, Daniel Jurafsky and Andrew Y. Ng. Cheap
> and Fast - But is it Good? Evaluating Non-Expert Annotations for
> Natural Language Tasks. EMNLP 2008.
> http://ai.stanford.edu/~rion/papers/amt_emnlp08.pdf
>
> "We collect 10 annotations for each of 177 examples of the noun
> “president” for the three senses given in SemEval. [...]
> performing simple majority voting (with random tie-breaking) over
> annotators results in a rapid accuracy plateau at a very high rate of
> 0.994 accuracy. In fact, further analysis reveals that there was
> only
> a single disagreement between the averaged non-expert vote and the
> gold standard; on inspection it was observed that the annotators
> voted
> strongly against the original gold la-bel (9-to-1 against), and that
> it was in fact found to be an error in the original gold standard
> annotation.6 After correcting this error, the non-expert accuracy
> rate
> is 100% on the 177 examples in this task. This is a specific example
> where non-expert annotations can be used to correct expert
> annotations. "
>
>
>
>
>
> Xuchen Yao, Benjamin Van Durme and Chris Callison-Burch. Expectations
> of Word Sense in Parallel Corpora. NAACL Short. 2012.
> http://cs.jhu.edu/~vandurme/papers/YaoVanDurmeCallison-BurchNAACL12.pdf
>
>
> "2 Turker Reliability
>
> While Amazon’s Mechanical Turk (MTurk) has been been considered in
> the
> past for constructing lexical semantic resources (e.g., (Snow et al.,
> 2008; Akkaya et al., 2010; Parent and Eskenazi, 2010; Rumshisky,
> 2011)), word sense annotation is sensi- tive to subjectivity and
> usually achieves low agree- ment rate even among experts. Thus we
> first asked Turkers to re-annotate a sample of existing gold-
> standard
> data. With an eye towards costs saving, we also considered how many
> Turkers would be needed per item to produce results of sufficient
> quality.
>
> Turkers were presented sentences from the test portion of the word
> sense induction task of SemEval-2007 (Agirre and Soroa, 2007),
> covering 2,559 instances of 35 nouns, expert-annotated with OntoNotes
> (Hovy et al., 2006) senses. [...]
>
> We measure inter-coder agreement using Krip- pendorff’s Alpha
> (Krippendorff, 2004; Artstein and Poesio, 2008), [...]"
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>



More information about the Corpora mailing list