[Corpora-List] Japanese and Korean PoS-taggers and lemmatisers

Johannes Goller gollerjo at cis.uni-muenchen.de
Wed Nov 19 14:44:40 CET 2008


Hello Viktor,

you may also want to consider using "mecab", which is newer than Chasen and has many happy users, too:

http://sourceforge.net/project/showfiles.php?group_id=177856

Another one, which follows slightly different tokenization and lemmatization standards, is "Juman":

http://www-lab25.kuee.kyoto-u.ac.jp/nl-resource/juman-e.html

A very high level comparison of several tokenizers is given on this web page in Japanese:

http://mecab.sourceforge.net/

"mecab" can be easily installed on Redhat-derived systems using

?> yum install mecab

regards,

Johannes Goller.


> For Japanese, I am a happy user of ChaSen:
>
> http://chasen.naist.jp/hiki/ChaSen/
>
> ... which you can install as debian package, if I remember correctly.
>
> Best regards,
>
> Marco
>
>
> v.pekar at gmail.com wrote:
> > Dear all,
> >
> > Can anyone recommend any part-of-speech taggers and lemmatisers for Japanese and Korean? Freely available tools are preferred, but I'd be interested to know about commercial ones as well.
> >
> > Many thanks,
> >
> > Viktor
> >
>
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list