you may also want to consider using "mecab", which is newer than Chasen and has many happy users, too:
Another one, which follows slightly different tokenization and lemmatization standards, is "Juman":
A very high level comparison of several tokenizers is given on this web page in Japanese:
"mecab" can be easily installed on Redhat-derived systems using
?> yum install mecab
> For Japanese, I am a happy user of ChaSen:
> ... which you can install as debian package, if I remember correctly.
> Best regards,
> v.pekar at gmail.com wrote:
> > Dear all,
> > Can anyone recommend any part-of-speech taggers and lemmatisers for Japanese and Korean? Freely available tools are preferred, but I'd be interested to know about commercial ones as well.
> > Many thanks,
> > Viktor
> Corpora mailing list
> Corpora at uib.no