[Corpora-List] dictionary of German antiquated and contracted word forms

Christian Chiarcos christian.chiarcos at web.de
Sun Mar 4 13:36:17 CET 2018

Dear Angelika,

you could try the Adelung (http://woerterbuchnetz.de/cgi-bin/WBNetz/wbgui_py?sigle=Adelung). It is 18th c. German, but it should not contain poetic contractions. It also was the basis of the morphological analyzer developed in TextGrid (Morphisto), but probably using abridged orthography. Furthermore, check http://www.woerterbuchnetz.de/cgi-bin/WBNetz/setupStartSeite.tcl, the Goethe dictionary or Grimm may be applicable.

Neither dictionary is directly available in machine-readable form, but you may contact the Trier Center for Digital Humanities for the XML sources or just scrape the generated HTML (if that's allowed in your legislation -- for Germany, the new UrhWissG allows to use up to 75% of a resource *for your own scientific research* [but disseminate only up to 15%]).

Suggestion: Use the Adelung for 18th c. German and maybe Grimm for pre-Duden 19th c. German, write expansion rules for contractions (there should be very few such rules, mostly e-insertion as in your examples) and double-check whether your rules hit anything in the dictionary.

Best, Christian

Am .03.2018, 15:26 Uhr, schrieb Angelika Peljak-Łapińska <angelika.peljak at gmail.com>:

> Dear Colleagues,
> we're currently working on the corpus of 18th-21st century German
> translations of 'Othello' (the corpus is accessible with some
> purpose-built analysis tools >at www.delightedbeauty.org/vvvclosed) and
> we encountered a problem while lemmatizing the data.A dictionary found
> at the Institut für Deutsche Sprache and WebLicht tool do not work well
> with antiquated and contracted (poetic and vernacular) word >forms (eg.
> Euer/Eur/Eu'r or Abentheuer/Abentheu'r/Abenteuer/Abenteu'r/Abenteur).
> Does anybody know a dictionary that would contain such old orthographic
> variants?
> Regards,
> Angelika Peljak
> PhD student at Swansea University
> PS. In case of any specific questions concerning the corpus please
> contact prof. Tom Cheesman (t.cheesman at swansea.ac.uk).
-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 2658 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20180304/78debea3/attachment.txt>

More information about the Corpora mailing list