[Corpora-List] Question about morphological dictionaries

maxwell maxwell at umiacs.umd.edu
Thu Jul 28 15:02:04 CEST 2016


On 2016-07-28 04:50, Andrii Elyiv wrote:
> Could you help me with one information? I am looking for the
> morphological dictionaries for different languages. Actually I need
> just names of all countries and its citizens in all possible forms and
> grammatical cases.
> For example, in English: Italy, Italian, Italians
> In Italian: Italia, italiano, italiana, italiane
> In Turkish: Italya, Italyan, Italyana, Italyanda, Italyandan, ...
> Mainly I need it for Spanish, French, Dutch, Polish, Czech,
> Portuguese, Arabic, Punjabi, Tamil and Hindi languages.

I doubt that a general purpose _dictionary_ (in the sense of a word list with definitions or translations) that would list all inflected forms exists for most of these languages. You might look for aspell or hunspell spelling dictionaries, but for some of these languages where there are lots and lots of inflected forms, you'd be better off using a morphological transducer. For Arabic, for example, there exists a free version of the Buckwalter parser, which can be obtained from various places (e.g. http://nlp.stanford.edu/software/parser-arabic-faq.shtml). In principle you could, if you really wanted to, dump the surface side of such a transducer to a text file, but it would take a long time, and result in a very large text file, for some of these languages.

Mike Maxwell



More information about the Corpora mailing list