[Corpora-List] GATE Processing Resources for Dutch

Adam Funk a.funk at dcs.shef.ac.uk
Thu Nov 15 14:26:00 CET 2012


[24/10/12 17:56] Diana Maynard wrote:


> However, if you particularly want it to work in GATE, it's possible
> we'll be integrating a newer version of OpenNLP into GATE shortly, which
> has models for Dutch. So that would be the simple solution, though I
> have no idea how good the Dutch components are.

We've done this now. Below is a copy of the announcement from the gate-users mailing list.

If you download the model files for Dutch and put them in the models/dutch subdirectory of GATE's OpenNLP plugin, the sample application should just work.

~~~~~

We've substantially updated GATE's OpenNLP plugin over the past few days to use the latest version of the OpenNLP library and the current model files. This updated plugin is available from svn and in today's daily snapshot, and will be included in the 7.1 release.

The plugin includes model files for English and sample applications (gapp files) for English, Dutch, and German. You need to download the model files for all languages other than English, as documented in the updated GATE user guide.

http://gate.ac.uk/sale/tao/splitch21.html#sec:misc-creole:opennlp

Models are available for Danish, German, English, Spanish, Dutch, Portuguese, and Swedish, but not for all the tools in each language. (The GATE PR supports the Maxent POS tagging models but not the Perceptron ones.)

http://opennlp.sourceforge.net/models-1.5/

If you have annotated corpora, you can train your own models using the OpenNLP training API outside of GATE, as described in the OpenNLP manual.

https://opennlp.apache.org/documentation/1.5.2-incubating/manual/opennlp.html

Enjoy!



More information about the Corpora mailing list