[Corpora-List] POS-tagger maintenance and improvement

John F. Sowa sowa at bestweb.net
Wed Feb 25 18:57:27 CET 2009

Adam and Eric,

AK> Am I too pessimistic? Are there ways of improving language

> models other than developing bigger and better training corpora

> -- not an exercise we have the resources to invest in? Are

> there commercial taggers I should be considering (as, in the

> commercial world, there is motivation for incremental improvements

> and responding to customer feedback)?

At our company (VivoMind Intelligence, Inc.) we have been getting good results by using a high-speed analogy engine. For some slides that illustrate three applications, see


All three of those applications processed plain text with no tagging. The last slide of that talk has URLs of related papers.

EA> As others have commented, TreeTagger models for other languages

> are also derived from a PoS-tagged corpus, which suggest the only

> way to eradicate systematic errors is to "correct" the tagging

> in the training corpus, or perhaps to use a different corpus

> altogether.

We have obtained good results by using multiple agents, which use different methods, data, or paradigms. Systematic errors caused by one agent can be corrected by evidence from other agents.

For the slides of another talk that discusses that approach, see


John Sowa

