[Corpora-List] looking for dutch corpus

Marco Baroni marco.baroni at unitn.it
Thu Feb 5 14:09:13 CET 2015


Dear All,

I am looking for a Dutch corpus with the following characteristics:

- I can download it (for free or for a fee) and process it with my own tools (as opposed to having just online access); [I will not redistribute it, I will acknowledge the source in any published work based on it, and I will not use it for commercial purposes, so most licensing schemes should be viable]

- large: ideally billions of words, minimally hundreds of millions of tokens;

- not too much work to convert it to plain text (e.g., I realize that I could create a corpus with the desired characteristics from the Dutch Wikipedia, but if somebody has already done it, I'd be happy to avoid re-doing the pre-processing myself.

If anybody has such a corpus, or can link/put me in touch with someone who does, I'll be very grateful.

Best,

Marco

-- Marco Baroni Center for Mind/Brain Sciences (CIMeC) University of Trento http://clic.cimec.unitn.it/marco



More information about the Corpora mailing list