[Corpora-List] POS statistics of lemmas in world languages

Mihail Kopotev mihail.kopotev at helsinki.fi
Tue Jul 24 11:14:01 CEST 2018


Dear all, I am wondering if anybody could provide data or point out to the POS statistics of lemmas in world languages, i.e. how many unique verbal/nominal/etc. lemmas are found in a corpus of a known size. Obviously, the figures highly depend on corpus size/genres etc, thus I am looking for the data based on more or less balanced, 100+ mln. corpora. Especially, I am interested in the following languages:

* Japanese, Korean, Chinese

* Hungarian, Finnish

* Hindi, German, Dutch, English, Polish, Greek, Romanian, Spanish, French

* Malay

* Arabic

* Basque

Thank you in advance for any pointers! Best, Mikhail

-- Mikhail Kopotev, PhD habil. Associate Professor Dept. of Modern Languages University of Helsinki http://www.helsinki.fi/~kopotev

-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 1357 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20180724/32d4c0d5/attachment.txt>



More information about the Corpora mailing list