[Corpora-List] Japanese and Korean PoS-taggers and lemmatisers

Johannes Goller gollerjo at cis.uni-muenchen.de
Wed Nov 19 14:44:40 CET 2008

Hello Viktor,

you may also want to consider using "mecab", which is newer than Chasen and has many happy users, too:


Another one, which follows slightly different tokenization and lemmatization standards, is "Juman":


A very high level comparison of several tokenizers is given on this web page in Japanese:


"mecab" can be easily installed on Redhat-derived systems using

?> yum install mecab


Johannes Goller.

> For Japanese, I am a happy user of ChaSen:
> http://chasen.naist.jp/hiki/ChaSen/
> ... which you can install as debian package, if I remember correctly.
> Best regards,
> Marco
> v.pekar at gmail.com wrote:
> > Dear all,
> >
> > Can anyone recommend any part-of-speech taggers and lemmatisers for Japanese and Korean? Freely available tools are preferred, but I'd be interested to know about commercial ones as well.
> >
> > Many thanks,
> >
> > Viktor
> >
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora

More information about the Corpora mailing list