[Corpora-List] POS-tagger maintenance and improvement
redpony at umd.edu
Fri Feb 27 00:03:00 CET 2009
> Its not like wikipedia, where you can pop in, fix
> something and in 1/2 hour you're on your way ...
> it requires a a heavy up-front investment, which
> in turn implies a long-term commitment (like most
> software projects) -- so its not for social butterflies.
I also work with statistical models, so I certainly appreciate the
difficulty of identifying a problem and coming up with a change to the
model that will learn the fix. But, I was actually proposing
something a bit different than this (and also different from what Adam
was originally asking for, but, in the long run, which I still think
would lead to the improvements of statistical tools that he was asking
for). Basically, there are some activities related to what we do that
don't require major expertise, but which could still benefit from
wikipedia-type organization. To name a few: correcting defects in
corpora, certain annotation tasks, and identifying areas of inadequate
coverage and/or systematic errors in existing tools.
On the other hand, with the ubiquity of language on the web as a
basically free resource, you could definitely argue that the NLP
community, more than any other, has benefited from the "bazaar" model,
so I should just stop complaining. Maybe that's a better way to think
More information about the Corpora