[Corpora-List] POS-tagger maintenance and improvement

Chris Dyer
Fri Feb 27 00:03:00 CET 2009

> Its not like wikipedia, where you can pop in, fix
> something and in 1/2 hour you're on your way ...
> it requires a a heavy up-front investment, which
> in turn implies a long-term commitment (like most
> software projects) -- so its not for social butterflies.
I also work with statistical models, so I certainly appreciate the difficulty of identifying a problem and coming up with a change to the model that will learn the fix. But, I was actually proposing something a bit different than this (and also different from what Adam was originally asking for, but, in the long run, which I still think would lead to the improvements of statistical tools that he was asking for). Basically, there are some activities related to what we do that don't require major expertise, but which could still benefit from wikipedia-type organization. To name a few: correcting defects in corpora, certain annotation tasks, and identifying areas of inadequate coverage and/or systematic errors in existing tools.

On the other hand, with the ubiquity of language on the web as a basically free resource, you could definitely argue that the NLP community, more than any other, has benefited from the "bazaar" model, so I should just stop complaining. Maybe that's a better way to think about this.


