[Corpora-List] Named Entity Corpora in Dutch

Ivelina Nikolova iva at lml.bas.bg
Thu Nov 8 10:46:21 CET 2012


Thanks Martin and Mikhail! I'll be checking out your references.

Ivelina

?? 8.11.2012 ?. 01:28 ?., Mikhail Kozhevnikov ??????:
> Dear Martin,
>
> To my knowledge even the bits already annotated are not available yet,
> as the data has not been officially released. I've tried to obtain the
> SRL annotations described in this paper
> <http://lt3.hogent.be/media/uploads/publications/2012/FinalSRL.pdf> in
> the end of September and got the following reply:
>
> The SRL annotations are not part of the second release of the
> intermediate SoNaR results. The final release will comprise SRL
> annotations: a 500K corpus that has been automatically labeled and
> a 500K corpus that has been completely manually verified.
> We do not know when the final release will be available, since the
> project is still not officially closed: an evaluation has shown
> that some alterations need to be made and documentation needs to
> be added. We can not start distribution before the official ending
> of the project.
>
>
> I too would be very interested in any new information concerning the
> release date or (partial) availability of the data.
>
> Regards,
> Mikhail
>
> On Wed, Nov 7, 2012 at 9:28 PM, Martin Reynaert <reynaert at uvt.nl
> <mailto:reynaert at uvt.nl>> wrote:
>
> Dear Ivelina,
>
> For Dutch we now have the SoNaR-500 corpus (currently about 540
> million word tokens of contemporary written Dutch, automatically
> annotated) and the SoNaR-1 corpus (about 1 million word tokens of
> contemporary written Dutch, largely manually annotated for semantics).
>
> For Named Entity Recognition the Support-Vector Machine tool
> (called 'NERD' for 'Named Entity Recognition for Dutch', developed
> at LT3, Ghent University, by Bart Desmet) used to automatically
> label SoNaR-500 was trained on the NEs manually labeled in SoNaR-1.
>
> To acquire the corpus, please enquire at the Dutch HLT Agency:
>
> http://www.inl.nl/tst-centrale/
>
> The full corpus itself may not be fully available yet, but should
> be soon, and you can at least sort out the licensing part at this
> stage. In fact, I am to date curating parts of its metadata.
>
> Best,
>
> Martin
>
>
>
>
>
> On 11/07/2012 06:23 PM, Ivelina Nikolova wrote:
>
> On 11/07/2012 05:49 PM, Alberto Lavelli wrote:
>
> The CoNLL 2002 shared task concerned Named Entity
> Recognition for
> Spanish and Dutch.
> You can find information about the CoNLL series here:
>
> http://ifarm.nl/signll/conll/
>
> Hope this helps
>
>
> Thanks Alberto!
> I got several references to this task corpus especially. It
> seems to be the most used one.
>
> Best,
> Ivelina
>
>
>
> alberto
>
>
> On Wed, Nov 07, 2012 at 04:13:07PM +0200, Ivelina Nikolova
> wrote:
>
> Dear Corpora Members,
>
> I am searching for corpora in Dutch with Named Entity
> annotations.
> I'm interested in Person, Location, Organization and
> Event mentions.
> Do you have any suggestions on that?
>
> Thank you very much!
> Ivelina
>
> --
> Ivelina Nikolova
> PhD student in Computer Science
> Linguistic Modelling Department
> Institute of Information and Communication Technologies
> Bulgarian Academy of Sciences
>
>
> _______________________________________________
> UNSUBSCRIBE from this page:
> http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no <mailto:Corpora at uib.no>
> http://mailman.uib.no/listinfo/corpora
>
>
>
>
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no <mailto:Corpora at uib.no>
> http://mailman.uib.no/listinfo/corpora
>
>
>
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora

-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 8846 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20121108/103abfb5/attachment.txt>



More information about the Corpora mailing list