[Corpora-List] How to do Japanese word segmentation using extraterm list?

hf.jiang hf.jiang at gmail.com
Fri Oct 21 10:43:45 CEST 2011


Thanks Pham.

I have found the solution. The manual page (http://mecab.sourceforge.net/dic.html) includes what I need. And I have asked one of my friend who knows Japanese to explain to me.

Wish my English be better, then I can supply colleagues an English version of the manual.

-Hongfei Jiang

------------------ Original ------------------ From: "Minh Pham"<minhpham0902 at gmail.com>; Date: Thu, Oct 20, 2011 04:04 PM To: "Adam Kilgarriff"<adam at lexmasterclass.com>; Cc: "hf.jiang"<hf.jiang at gmail.com>; "corpora"<corpora at uib.no>; "Hiram Calvo"<hiramcalvo at gmail.com>; "Jan Pomikále"<xpomikal at fi.muni.cz>; Subject: Re: [Corpora-List] How to do Japanese word segmentation using extraterm list?

Hi,

Could you please tell us exactly what input is and desired output is?

By the way, after installing mecab tool, in the command line, you can refer the help of the tool by typing:

mecab.exe --help

The help is in English.

Best regards, Pham

On Thu, Oct 20, 2011 at 4:22 PM, Adam Kilgarriff <adam at lexmasterclass.com> wrote:

> However, since almost of the user manual is in Japanese, I can not understand completely.

We have the same problem; are there any English versions anywhere (specially for mecab). Pointers and advice appreciated

Adam

On 20 October 2011 08:08, hf.jiang <hf.jiang at gmail.com> wrote:

Hi,all

I'm currently trying to process Japanese texts.

Some friends suggest me use Chasen or Mecab.

However, since almost of the user manual is in Japanese, I can not understand completely.

My expectation is that the segmentation tool can recognize the words preferred to my term list.

Note that I have not enough gold data for the training of the tools, so, the off-the-shelf tool is better for me.

Looking forward to your reply, thanks.

-Hongfei Jiang

_______________________________________________

UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora

Corpora mailing list

Corpora at uib.no

http://mailman.uib.no/listinfo/corpora

-- ======================================== Adam Kilgarriff adam at lexmasterclass.com

Director Lexical Computing Ltd Visiting Research Fellow University of Leeds Corpora for all with the Sketch Engine

DANTE: a lexical database for English ========================================

_______________________________________________

UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora

Corpora mailing list

Corpora at uib.no

http://mailman.uib.no/listinfo/corpora

-- Pham Quang Nhat Minh (Mr) PhD student NLP Laboratory - School of Information Science - JAIST 1-1 Asahidai, Nomi, 923-1292 Japan Email: minhpqn at jaist.ac.jp

Web: http://www.jaist.ac.jp/index-e.html -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 5630 bytes Desc: not available URL: <http://www.uib.no/mailman/public/corpora/attachments/20111021/d3c2c1c3/attachment.txt>



More information about the Corpora mailing list