[Corpora-List] BCCWJ corpus

Shin Ishikawa iskwshin at gmail.com
Sat Jul 8 01:01:02 CEST 2017


Dear Darren

Both Shonagon and Chunagon are the web interfaces to retrieve BCCWJ.

You can use Shonagon without registration, but it offers only a simple search function (No POS search). You need to register to use Chunagon, where you can conduct more advanced searches using POS tags.

And if you like to access the text data directly, you need to buy the DVD edition.

Basic info about DVD edition of BCCWJ (in English) http://pj.ninjal.ac.jp/corpus_center/bccwj/en/dvd-index.html

Info about obtaining the DVD edition (in Japanese) http://pj.ninjal.ac.jp/corpus_center/bccwj/assets_c/2015/09/bccwj-chart-3818.html

Related documents are available from the link below: http://pj.ninjal.ac.jp/corpus_center/bccwj/subscription.html

No download version.

Shin

Dr. Shin Ishikawa Kobe University, Japan iskwshin at gmail.com

2017-07-08 6:45 GMT+09:00 Darren Cook <darren at dcook.org>:
> Can someone tell me how to download the BCCWJ corpus? There is a
> "shonagon" and a "chunagon" link (*), but the chunagon page describes
> itself as a web application. So I guessed the shonagon was the download;
> but it seems to just be an online search engine for the corpus. Is there
> no free download, and it is only available on the DVDs?
>
> (I just wanted to reproduce the output of an open source tokenizer that
> used BCCWJ for its training data, as a baseline for any improvements or
> bug fixes I might make.)
>
> Thanks,
>
> Darren
>
> *: http://pj.ninjal.ac.jp/corpus_center/bccwj/en/
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list