[Corpora-List] Looking for Polish news corpus

Agata Savary agata.savary at univ-tours.fr
Sun Jun 25 12:01:13 CEST 2017

Hi Janne,

Sorry for the late answer. The National Corpus of Polish <http://nkjp.pl/index.php?page=0&lang=1> contains 1.500 millions of words. A 1-million word subcorpus is manually double- annotated and adjudicated. The corpus has several annotation layers: segmentation, morphology, shallow syntax (with some multiword expressions), name entities and word senses. All is downloadable <http://clip.ipipan.waw.pl/NationalCorpusOfPolish> and under an open license. Good parts of this corpus consist in newspaper texts. I hope you can find it useful.


On 04/24/2017 01:41 PM, Janne Bondi Johannessen wrote:
> Dear colleagues.
> Does any of you now of a substantial and downloadable Polish corpus? We need it for a project on distributional semantics.
> Best wishes,
> Janne Bondi Johannessen
> --
> Janne Bondi Johannessen <http://www.hf.uio.no/multiling/english/people/core-group/jannebj/index.html>
> Professor, University of Oslo & editor of Norsk Lingvistisk Tidsskrift
> The Text Laboratory, ILN &
> Center for Multilingualism in Society across the Lifespan
> P.O.Box 1102 Blindern, 0317 Oslo, Norway
> Tel: +47 22 85 68 14, mob.: +47 928 966 34, e-mail: jannebj at iln.uio.no <mailto:jannebj at iln.uio.no>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora

-- Agata Savary Associate Professor Université François Rabelais de Tours 3 place Jean-Jaurès, 41029 Blois, France phone: +33 (0)2 54 55 21 47 agata.savary at univ-tours.fr http://www.info.univ-tours.fr/~savary/ PARSEME COST Action: http://www.parseme.eu

-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 4453 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20170625/94797b91/attachment.txt>

More information about the Corpora mailing list