[Corpora-List] Using MTurk for markup tasks (was Cost of part

Dragomir R. Radev radev at umich.edu
Tue Dec 26 22:53:00 CET 2006


> Alexandre Rafalovitch wrote:

> > An interesting approach would be to use Amazon Mechanical Turk for

> > this kinds of tasks.

> > ...

> > Has anybody else given a thought to this?


> Don't know what languages you're interested in. I have thought about

> "wikifying" other sorts of projects (like finding and keeping track of

> on-line computational resources, or building bilingual text

> collections)

Have you looked at www.aclweb.org/aclwiki ?


> for "low density" languages. I have never actually tried this, but it

> may be instructive to look at the languages for which there are

> substantial Wikipedia and Wiktionary resources. Last time I looked, the

> usual suspects (the major and some "minor" European languages, plus

> Japanese) had at least 100k Wikipedia articles, while there was a

> slightly wider variety of languages with at least 10k Wikipedia articles

> (including Arabic (= MSA), Persian, Hebrew, Bahasa Indonesian, Korean,

> Malay, Thai, Turkish and Chinese). For comparison, the English

> Wikipedia has 1.5 million articles.


> My guess is that "wikification" (including the Amazon Mechanical Turk

> under this) will work best for languages where there are a substantial

> number of speakers with idle time, sufficient income to afford the

> computer and network connection, and sufficient education for the

> specific annotation task.

> --

> Mike Maxwell

> maxwell at umiacs.umd.edu




Dragomir R. Radev Associate Professor
SI, CSE, Ling U. Michigan, Ann Arbor
http://www.eecs.umich.edu/~radev radev at umich.edu

More information about the Corpora-archive mailing list