[Corpora-List] Using MTurk for markup tasks (was Cost of part of speech tagging)

Mike Maxwell maxwell at umiacs.umd.edu
Tue Dec 26 22:13:01 CET 2006

Alexandre Rafalovitch wrote:

> An interesting approach would be to use Amazon Mechanical Turk for

> this kinds of tasks.

> ...

> Has anybody else given a thought to this?

Don't know what languages you're interested in. I have thought about
"wikifying" other sorts of projects (like finding and keeping track of
on-line computational resources, or building bilingual text collections)
for "low density" languages. I have never actually tried this, but it
may be instructive to look at the languages for which there are
substantial Wikipedia and Wiktionary resources. Last time I looked, the
usual suspects (the major and some "minor" European languages, plus
Japanese) had at least 100k Wikipedia articles, while there was a
slightly wider variety of languages with at least 10k Wikipedia articles
(including Arabic (= MSA), Persian, Hebrew, Bahasa Indonesian, Korean,
Malay, Thai, Turkish and Chinese). For comparison, the English
Wikipedia has 1.5 million articles.

My guess is that "wikification" (including the Amazon Mechanical Turk
under this) will work best for languages where there are a substantial
number of speakers with idle time, sufficient income to afford the
computer and network connection, and sufficient education for the
specific annotation task.
Mike Maxwell
maxwell at umiacs.umd.edu

More information about the Corpora-archive mailing list