[Corpora-List] Looking for Polish news corpus

Martin Wynne martin.wynne at bodleian.ox.ac.uk
Mon Jun 26 12:33:09 CEST 2017


There is also the ChronoPress corpus of historical Polish newspapers texts. The current version of the corpus covers 1945-54, and is available at the Polish CLARIN centre repository here:

https://clarin-pl.eu/dspace/handle/11321/260

The corpus can be queried online at http://chronopress.clarin-pl.eu, and I understand that the intention is to expand the timescale of the corpus to cover 100 years from 1918, so watch that space!

Best wishes, Martin Wynne

-- Oxford Text Archive, Bodleian Libraries, University of Oxford Tel: +44 1865 283813 martin.wynne at bodleian.ox.ac.uk

On 26/06/17 11:00, corpora-request at uib.no wrote:
> Date: Sun, 25 Jun 2017 12:01:13 +0200
> From: Agata Savary <agata.savary at univ-tours.fr>
> Subject: Re: [Corpora-List] Looking for Polish news corpus
> To: corpora at uib.no
>
> Hi Janne,
>
> Sorry for the late answer.
> The National Corpus of Polish <http://nkjp.pl/index.php?page=0&lang=1> contains 1.500 millions of words. A 1-million word subcorpus is manually
> double- annotated and adjudicated.
> The corpus has several annotation layers: segmentation, morphology, shallow syntax (with some multiword expressions), name entities and word senses.
> All is downloadable <http://clip.ipipan.waw.pl/NationalCorpusOfPolish> and under an open license.
> Good parts of this corpus consist in newspaper texts. I hope you can find it useful.
>
> Agata
>
> On 04/24/2017 01:41 PM, Janne Bondi Johannessen wrote:
>> Dear colleagues.
>>
>> Does any of you now of a substantial and downloadable Polish corpus? We need it for a project on distributional semantics.
>>
>> Best wishes,
>> Janne Bondi Johannessen
>>
>> --
>> Janne Bondi Johannessen <http://www.hf.uio.no/multiling/english/people/core-group/jannebj/index.html>
>> Professor, University of Oslo & editor of Norsk Lingvistisk Tidsskrift
>> The Text Laboratory, ILN &
>> Center for Multilingualism in Society across the Lifespan
>> P.O.Box 1102 Blindern, 0317 Oslo, Norway
>> Tel: +47 22 85 68 14, mob.: +47 928 966 34, e-mail: jannebj at iln.uio.no <mailto:jannebj at iln.uio.no>
>>
>>
>> _______________________________________________
>> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
>> Corpora mailing list
>> Corpora at uib.no
>> http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list