[Corpora-List] Named Entity Corpora in Dutch

Martin Reynaert reynaert at uvt.nl
Wed Nov 7 21:28:58 CET 2012

Dear Ivelina,

For Dutch we now have the SoNaR-500 corpus (currently about 540 million word tokens of contemporary written Dutch, automatically annotated) and the SoNaR-1 corpus (about 1 million word tokens of contemporary written Dutch, largely manually annotated for semantics).

For Named Entity Recognition the Support-Vector Machine tool (called 'NERD' for 'Named Entity Recognition for Dutch', developed at LT3, Ghent University, by Bart Desmet) used to automatically label SoNaR-500 was trained on the NEs manually labeled in SoNaR-1.

To acquire the corpus, please enquire at the Dutch HLT Agency:


The full corpus itself may not be fully available yet, but should be soon, and you can at least sort out the licensing part at this stage. In fact, I am to date curating parts of its metadata.



On 11/07/2012 06:23 PM, Ivelina Nikolova wrote:
> On 11/07/2012 05:49 PM, Alberto Lavelli wrote:
>> The CoNLL 2002 shared task concerned Named Entity Recognition for
>> Spanish and Dutch.
>> You can find information about the CoNLL series here:
>> http://ifarm.nl/signll/conll/
>> Hope this helps
> Thanks Alberto!
> I got several references to this task corpus especially. It seems to
> be the most used one.
> Best,
> Ivelina
>> alberto
>> On Wed, Nov 07, 2012 at 04:13:07PM +0200, Ivelina Nikolova wrote:
>>> Dear Corpora Members,
>>> I am searching for corpora in Dutch with Named Entity annotations.
>>> I'm interested in Person, Location, Organization and Event mentions.
>>> Do you have any suggestions on that?
>>> Thank you very much!
>>> Ivelina
>>> --
>>> Ivelina Nikolova
>>> PhD student in Computer Science
>>> Linguistic Modelling Department
>>> Institute of Information and Communication Technologies
>>> Bulgarian Academy of Sciences
>>> _______________________________________________
>>> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
>>> Corpora mailing list
>>> Corpora at uib.no
>>> http://mailman.uib.no/listinfo/corpora

More information about the Corpora mailing list