[Corpora-List] Simple tokenizer for Chinese

Hongyin Tao bbs.lists at gmail.com
Mon Oct 27 19:34:52 CET 2008


One of the best tokenizers is ICTCLAS by researchers from the Chinese Academy of Sciences.

http://www.ictclas.org/

If you have more questions regarding Chinese corpora and corpus tools, visit

http://www.corpus4u.org

Hongyin Tao Professor of Chinese Language and Linguistics & Applied Linguistics and TESL University of California, Los Angeles (UCLA) Department of Asian Languages and Cultures 290 Royce Hall Los Angeles, CA 90095-1540 Tel: (310) 206-6872 Fax: (310) 825-8808

On Mon, Oct 27, 2008 at 3:04 AM, Emiliano Guevara <emiliano.guevara at unibo.it
> wrote:


> Dear all,
>
> could you please suggest me pointers to simple tokenizers for Chinese
> text corpora?
> It will be used by a student with very basic background, so standalone
> or GUI options would be preferred.
>
> Thanks in advance,
>
> E.
>
>
>
>
> ************************************************************************
> Emiliano R. Guevara
> FacoltÓ di Lingue e Lett. Straniere - Dip. di Lingue e Lett. Straniere
> UniversitÓ di Bologna - Via Cartoleria 5 (40124) Bologna, Italia
> http://morbo.lingue.unibo.it/
> emiliano.guevara at unibo.it - emiguevara at gmail.com
> ************************************************************************
>
>
>
>
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>
-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 3402 bytes Desc: not available Url : https://mailman.uib.no/public/corpora/attachments/20081027/b1da28b9/attachment.txt



More information about the Corpora mailing list