[Corpora-List] Summary of responses: German lemma list

Niels Ott niels at drni.de
Sat Mar 10 17:59:00 CET 2007

Hash: SHA1

Dear all,

over a week ago I asked for a German lemma list. I received a number of
replies. From all suggestions made, the one of extracting a lemma list
from the ispell word list won the race... because this was the easiest
thing to do in the limited time we had.

Let me briefly summarize the suggestions I received both on the list and
in private (in no particular order):

Annette Klosa offered a contract over academic use of the word list from
the Elexico project which is based in frequency data from the German IDS
corpora. http://www.elexiko.de/

Lars Aronson was the one who suggested to use German spell checker
dictionaries, namely those of ispell/aspell/myspell/hunspell.*

René Witte suggested to have a look at the Durm Lemmatizer which
apparently comes with a lexicon.*

Yannick Versley suggested to use the lexicon of the CDG parser.*

Peter Adolphs suggested to have a look at Morphy by Wolfgang Lezius
which can export the lexical data it uses. http://www.wolfganglezius.de/

[*]: Those are (part of) open source projects.

Thank you very much for your assistance!


Niels Ott

Niels Ott schrieb:

> Dear all,


> about a month ago there as a little discussion going on here about

> English lemma lists.


> We should have a lemma list for German. There is no special requirement

> but containing lemmata, e.g.


> Haus

> Katze

> gehen

> sitzen


> Furthermore it would be nice if the list was equipped with POS. But

> that's not a strict requirement.


> It would be admirable if this list was free in the sense of free

> speech/open source or if use was restricted to non-commercial

> applications. (This is for a student's project at Univ.)


> Thank you very much in advance for your assistance.


> Regards,


> Niels Ott



- --
Niels Ott - Computational Linguist (B.A.) - http://www.drni.de/niels/
Tangente: Veralgter Wasservogel
Version: GnuPG v1.4.2.2 (GNU/Linux)


More information about the Corpora-archive mailing list