May I ponder this (from Fort et al, 2011: http://www.mitpressjournals.org/doi/pdf/10.1162/COLI_a_00057): "in (Bhardwaj et al. 2010),[...] it is shown that, for their task of word sense disambiguation, a small number of trained annotators are superior to a larger number of untrained Turkers. On that point, their results contradict that of (Snow et al. 2008), whose task was much simpler (the number of senses per word was 3 for the latter, versus 9.5 for the former)"
Bhardwaj, Vikas, Rebecca Passonneau, Ansaf Salleb-Aouissi, and Nancy Ide. 2010. Anveshan: A tool for analysis of multiple annotators’ labeling behavior. In Proceedings of The Fourth Linguistic Annotation Workshop (LAW IV) pages 47–55, Uppsala.
Karën Fort ATER ENSMN Loria, équipe Sémagramme Bureau C303 +33 (0)3 54 95 86 54 http://www.loria.fr/~fortkare/
----- Mail original -----
> De: "Benjamin Van Durme" <vandurme at cs.jhu.edu>
> À: corpora at uib.no
> Envoyé: Mardi 16 Juillet 2013 14:32:38
> Objet: Re: [Corpora-List] WSD / # WordNet senses / Mechanical Turk
> Rion Snow, Brendan O'Connor, Daniel Jurafsky and Andrew Y. Ng. Cheap
> and Fast - But is it Good? Evaluating Non-Expert Annotations for
> Natural Language Tasks. EMNLP 2008.
> "We collect 10 annotations for each of 177 examples of the noun
> “president” for the three senses given in SemEval. [...]
> performing simple majority voting (with random tie-breaking) over
> annotators results in a rapid accuracy plateau at a very high rate of
> 0.994 accuracy. In fact, further analysis reveals that there was
> a single disagreement between the averaged non-expert vote and the
> gold standard; on inspection it was observed that the annotators
> strongly against the original gold la-bel (9-to-1 against), and that
> it was in fact found to be an error in the original gold standard
> annotation.6 After correcting this error, the non-expert accuracy
> is 100% on the 177 examples in this task. This is a specific example
> where non-expert annotations can be used to correct expert
> annotations. "
> Xuchen Yao, Benjamin Van Durme and Chris Callison-Burch. Expectations
> of Word Sense in Parallel Corpora. NAACL Short. 2012.
> "2 Turker Reliability
> While Amazon’s Mechanical Turk (MTurk) has been been considered in
> past for constructing lexical semantic resources (e.g., (Snow et al.,
> 2008; Akkaya et al., 2010; Parent and Eskenazi, 2010; Rumshisky,
> 2011)), word sense annotation is sensi- tive to subjectivity and
> usually achieves low agree- ment rate even among experts. Thus we
> first asked Turkers to re-annotate a sample of existing gold-
> data. With an eye towards costs saving, we also considered how many
> Turkers would be needed per item to produce results of sufficient
> Turkers were presented sentences from the test portion of the word
> sense induction task of SemEval-2007 (Agirre and Soroa, 2007),
> covering 2,559 instances of 35 nouns, expert-annotated with OntoNotes
> (Hovy et al., 2006) senses. [...]
> We measure inter-coder agreement using Krip- pendorff’s Alpha
> (Krippendorff, 2004; Artstein and Poesio, 2008), [...]"
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no