[Corpora-List] POS-tagger maintenance and improvement

Jimmy O'Regan joregan at gmail.com
Thu Feb 26 22:40:16 CET 2009

2009/2/26 Linas Vepstas <linasvepstas at gmail.com>:
> BTW, I am *very* interested in automatically learning
> new disjuncts (link-grammar rules) via corpus statistics
> -- I think this is an excellent line of research, PhD level,
> for this parser, or any other NLP system, POS tagger, etc.

Marcin Miłkowski (LanguageTool) has a blog post about using Wikipedia edits as a corpus of errors: http://morfologik.blogspot.com/2007/01/wikipedia-history-diff-as-revision.html

He has done more work since then towards automating rule construction; it might be worth your while getting in contact with him.

More information about the Corpora mailing list