[Corpora-List] Question about morphological dictionaries

Jakub Piskorski jpiskorski at googlemail.com
Thu Jul 28 15:22:27 CEST 2016


For Polish this resource should come in handy: http://clip.ipipan.waw.pl/Gazetteer It contains many inflected variants of proper names.



On Thu, Jul 28, 2016 at 3:02 PM, maxwell <maxwell at umiacs.umd.edu> wrote:

> On 2016-07-28 04:50, Andrii Elyiv wrote:
>> Could you help me with one information? I am looking for the
>> morphological dictionaries for different languages. Actually I need
>> just names of all countries and its citizens in all possible forms and
>> grammatical cases.
>> For example, in English: Italy, Italian, Italians
>> In Italian: Italia, italiano, italiana, italiane
>> In Turkish: Italya, Italyan, Italyana, Italyanda, Italyandan, ...
>> Mainly I need it for Spanish, French, Dutch, Polish, Czech,
>> Portuguese, Arabic, Punjabi, Tamil and Hindi languages.
> I doubt that a general purpose _dictionary_ (in the sense of a word list
> with definitions or translations) that would list all inflected forms
> exists for most of these languages. You might look for aspell or hunspell
> spelling dictionaries, but for some of these languages where there are lots
> and lots of inflected forms, you'd be better off using a morphological
> transducer. For Arabic, for example, there exists a free version of the
> Buckwalter parser, which can be obtained from various places (e.g.
> http://nlp.stanford.edu/software/parser-arabic-faq.shtml). In principle
> you could, if you really wanted to, dump the surface side of such a
> transducer to a text file, but it would take a long time, and result in a
> very large text file, for some of these languages.
> Mike Maxwell
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 2768 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20160728/e93ce0a5/attachment.txt>

More information about the Corpora mailing list