[Corpora-List] Frequency of masc./fem/neut. in German

Andras Kornai andras
Fri Apr 17 17:54:11 CEST 2009

On Wed, Apr 15, 2009 at 10:35:13AM -0700, Dan I. Slobin wrote:
> How does this count treat noun compounds? E.g., das Werk, der
> Werkfuehrer, die Werkstatt... / die Kammer, das Kammerwasser, der
> Kammerbeamter...


to the extent compounds inherit their gender from their head it is extremely unlikely that the overall numbers would change much, this would require some special effect that impacts the productivity of masc fem or neut bases differentially. You can observe the same broad tendency, neuters contributing only about 15%, the rest being fem and masc distributed about equally, by simply counting die der and das in running text. In 10.1m words from Project Gutenberg (typically 19th c. or earlier material) you find

242894 die 238893 der 106332 das

and similarly for 1990s newspaper text (8.4m words of Der Spiegel)

284777 die 265051 der 86214 das

Given that such numbers are easily swayed by style -- compare a 14.9m word sample from Frankfurter Rundschau from the same year that has

501637 der 497189 die 143069 das

and the fact that plurals would favor die over der, the numbers are largely consistent with Sven's findings (but are obtained with far less work).

> Here are some type counts based on noun readings (and not noun
> lemmas)
> in two computational lexica for German,
> ignoring readings with more than 1 possible gender:
> fem masc neut
> HaGenLex 6409 4702 1723
> CELEX+HaGenLex 23311 15846 10064

Altogether, the effect of usage (masculine nouns seem to be used more frequently than their frequency among stems would dictate) appear to be considerably greater than the effects of compounding, but this is just a rough order of magnitude impression, it would take quite a bit of work to unravel the impact of these factors across genres and styles.

Andras Kornai

More information about the Corpora mailing list