[Corpora-List] Lemmatizing German text for lexical purposes

Heike Zinsmeister heikezinsmeister at googlemail.com
Mon Jan 16 22:45:52 CET 2012

Hi Ciarán,

On 16 January 2012 16:07, Ciarán Ó Duibhín <ciaran at oduibhin.freeserve.co.uk>wrote:

> **
> Are there any lemmatized corpora of German, which can be used queried
> on-line or on Windows? I'm trying to lemmatize some German text myself for
> lexical purposes, and I would like to see how others have handled the
> problems, and how well it works.

You might want to have a look at the DWDS corpora: http://www.dwds.de/.

> Of the German corpora I have found, Negra is POS-tagged but not
> lemmatized, while Tiger is both POS-tagged and lemmatized. Negra does not
> mention any query facility; Tiger had one which is no longer supported
> and unfortunately doesn't work for me.

TIGERSearch is still available: http://www.wolfganglezius.de/doku.php?id=cl:tigersearch (there is also a link to a new version for Mac)

TIGERRegistry, which comes with TIGERSearch, allows you to import the Negra corpus and other formats.

The latest releases of TüBa-D/Z are also lemmatized (but also use STTS): http://www.sfs.uni-tuebingen.de/en/tuebadz.shtml

As to the decomposition of compound words, SMOR by Helmut Schmid http://www.ims.uni-stuttgart.de/~schmid/ would provide this (I'm not sure about the release conditions though).

Best, Heike

-- *********************************** Dr. Heike Zinsmeister

Department of Computer Science, University of Toronto Toronto, Ontario, CANADA M5S 3G4 *Office:* Room 386, D.L. Pratt Bldg, 6 King's College Road & Department of Linguistics, University of Konstanz, Box 185 D-78457 Konstanz, GERMANY

Web: http://ling.uni-konstanz.de/page/home/zinsmeister -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 2773 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20120116/29e88ad5/attachment.txt>

More information about the Corpora mailing list