[Corpora-List] POS statistics of lemmas in world languages
mihail.kopotev at helsinki.fi
Tue Jul 24 11:14:01 CEST 2018
I am wondering if anybody could provide data or point out to the POS
statistics of lemmas in world languages, i.e. how many unique
verbal/nominal/etc. lemmas are found in a corpus of a known size.
Obviously, the figures highly depend on corpus size/genres etc, thus I
am looking for the data based on more or less balanced, 100+ mln.
corpora. Especially, I am interested in the following languages:
* Japanese, Korean, Chinese
* Hungarian, Finnish
* Hindi, German, Dutch, English, Polish, Greek, Romanian, Spanish, French
Thank you in advance for any pointers!
Mikhail Kopotev, PhD habil.
Dept. of Modern Languages
University of Helsinki
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 1357 bytes
Desc: not available
More information about the Corpora