[Corpora-List] Hi

Francis Tyers ftyers at prompsit.com
Wed Nov 18 09:53:44 CET 2009


El dt 17 de 11 de 2009 a les 17:23 +0100, en/na Harald Hammarström va escriure:
> Dear Rye Abdi,
> Maybte the following papers are relevant to your q. all the best, H
>
> Abdillahi, Nimaan, Pascal Nocera & Juan-Manuel Torres-Moreno. 2006. Boites
> à outils TAL pour les langues peu informatisées: le cas du Somali. In
> Journées
> d.Analyses des Données Textuelles (JADT 06), 697-705. Besançon-France
>
>
> Hurskainen, A. 1992. A Two-Level Computer Formalism for the Analysis of
> Bantu Morphology: An Application to Swahili. Nordic Journal of African
> Studies 1(1). 87.119.
>
> Pauw, G. De & G.-M. de Schryver. 2008. Improving the Computational
> Morphological
> Analysis of a Swahili Corpus for Lexicographic Purposes. Lexikos
> 18. 303.318.
>
> Pauw, G. De, G-M. de Schryver & P.W. Wagacha. 2006. Data-driven
> part-ofspeech
> tagging of Kiswahili. In Proceedings of Text, Speech and Dialogue, 9th
> International Conference (LNAI 4188), 197-204. Berlin: Springer-Verlag

I would add to this that it is not now necessary to use the proprietary Xerox toolkit for finite-state morphology. There are two excellent free software (GPL) projects which implement the formalism:

* HFST -- for lexc and twol

http://www.ling.helsinki.fi/kieliteknologia/tutkimus/hfst/

* Foma -- for lexc and xfst

http://foma.sourceforge.net/

Both have been tested on a wide variety of lexicons "in the wild" and the authors are actively maintaining the software and keen to hear comments and suggestions.

By all means buy the FSM book (it really is fantastic), but just to let you know that there are free alternatives now.

Regards,

Fran



More information about the Corpora mailing list