[Corpora-List] Using MTurk for markup tasks (was Cost of part of speech tagging)

Alexandre Rafalovitch arafalov at gmail.com
Tue Dec 26 22:53:04 CET 2006

On 12/26/06, Mike Maxwell <maxwell at umiacs.umd.edu> wrote:

> Alexandre Rafalovitch wrote:

> > An interesting approach would be to use Amazon Mechanical Turk for

> > this kinds of tasks.

> > ...

> > Has anybody else given a thought to this?


> Don't know what languages you're interested in. I have thought about

> "wikifying" other sorts of projects (like finding and keeping track of

> on-line computational resources, or building bilingual text collections)

> for "low density" languages.

Actually, wikification is a different, though also interesting, idea.
Wikification would be about content presentation and markup, while
MTurk would be about the workflow and process of actually marking up
the text. I think using generic Wiki for POS marking may not be very
efficient. A specialised programme that allows to do it fast, would be
more effective. These programmes exist as standalone applications, but
not as online interface and certainly not as MTurk interface yet.
(AFAIK). Obviously, if workflow and presentation could be combined
into one interface, the benefits would compound.

> My guess is that "wikification" (including the Amazon Mechanical Turk

> under this) will work best for languages where there are a substantial

> number of speakers with idle time, sufficient income to afford the

> computer and network connection, and sufficient education for the

> specific annotation task.

My proposed target is Students and Research assistants in the fields
on Linguistics and Computational Linguistics. They (should) have
training, access to the network connection (through their
universities) and need for making income in time-flexible fashion
around their other duties/studies. Languages are obviously an issue,
but with already distributed nature of MTurk, it might be possible to
reach the language speakers where they are rather than where you would
need them to be with centralised architecture.


More information about the Corpora-archive mailing list