Have a look at
If you have to build your own tagger (and are concerned with a good work/accuracy ratio rather than squeezing out the last 0.1% out of it), I'd recommend (besides SVMTool, which also seems to be trainable): * tnt by Thorsten Brants (non-commercial only license), which is based on markov model tagging (smoothed HMM) -- tnt reaches 96.7% on PTB * CRF++ by Taku Kudo (GPL), which, as the name suggests, uses Conditional Random Fields and allows you to use arbitrary features (or feature combinations)
There's also a nice overview on POS tagging results in the following paper: http://www-tsujii.is.s.u-tokyo.ac.jp/~tsuruoka/papers/emnlp05bidir.pdf (Yoshimasa Tsuruoka and Jun'ichi Tsujii, Bidirectional Inference with the Easiest-First Strategy for Tagging Sequence Data, HLT/EMNLP 2005)
For other languages (e.g. German), it often makes sense to combine ML-based tagging with some kind of rule-based postprocessing to fix errors that are due to the model's inability to get long-range dependencies right.
Best wishes, Yannick
On Tuesday 21 October 2008 05:12, wang xiaolin wrote:
> Hello , everyone.
> I've started to work on POS tagging recently. I have been following the
> course of NLTK by Bird, Klein and Loper, and have learned a variety of
> basic methods including default tagger, regular expression tagger,
> unigram tagger and n-gram tagger. My problem is whether there is a
> well-known practical tagger in state-of-art, or any recent surveys that
> rank all kinds of tagging methods and practical tagging systems. e.g. I
> used to work on text categorization. I know SVM is (agreed by most
> persons) the classifier in state-of-art, and Naïve Bayes is simple and
> works a little worse than SVM , but still practical. KNN is time
> consuming in test phrase, and its performance is between the SVM's and
> Naïve Bayes'. Sebastiani(2002) give a wonderful survey on this topic.
> Can anyone give me some equivalent ideas and informations in POS tagging
> domain ? A lot of thanks.
> Best wishes
> Arthur Wang
> Corpora mailing list
> Corpora at uib.no
-- Yannick Versley Seminar für Sprachwissenschaft, Abt. Computerlinguistik Wilhelmstr. 19, 72074 Tübingen Tel.: (07071) 29 77352