[Corpora-List] POS-tagger maintenance and improvement

Helmut Schmid schmid at ims.uni-stuttgart.de
Thu Feb 26 09:06:45 CET 2009


Hi Adam,

as the developer of the TreeTagger, I would like to emphasize that I am still maintaining this software and that any feedback and suggestions for improvements are highly welcome! I am also very interested in collaborations for training the TreeTagger on new languages.

Best regards,

Helmut Schmid

Adam Kilgarriff schrieb:
> All,
>
> My lexicography colleagues and I use POS-tagged corpora all the time,
> every day, and very frequently spot systematic errors. (This is for a
> range of languages, but particularly English.) We would dearly like
> to be in a dialogue with the developers of the POS-tagger and/or the
> relevant language models so the tagger+model could be improved in
> response to our feedback. (We have been using standard models rather
> than training our own.) However it seems, for the taggers and
> language models we use (mainly TreeTagger, also CLAWS) and also for
> other market leaders, all of which seem to be from Universities, the
> developers have little motivation for continuing the improvement of
> their tagger, since incremental improvements do not make for good
> research papers, so there is nowhere for our feedback to go, nor any
> real prospect of these taggers/models improving.
>
> Am I too pessimistic? Are there ways of improving language models
> other than developing bigger and better training corpora - not an
> exercise we have the resources to invest in? Are there commercial
> taggers I should be considering (as, in the commercial world, there is
> motivation for incremental improvements and responding to customer
> feedback)?
> Responses and ideas most welcome
>
> Adam Kilgarriff
> --
> ================================================
> Adam Kilgarriff
> http://www.kilgarriff.co.uk
> Lexical Computing Ltd http://www.sketchengine.co.uk
> Lexicography MasterClass Ltd http://www.lexmasterclass.com
> Universities of Leeds and Sussex adam at lexmasterclass.com
> <mailto:adam at lexmasterclass.com>
> ================================================
> ------------------------------------------------------------------------
>
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>



More information about the Corpora mailing list