[Corpora-List] Morphological segmentation & Morphosemantic parsing for French
Fiammetta.Namer at univ-nancy2.fr
Fri Feb 4 09:36:00 CET 2005
Hi John and Edina,
I am currently developing a morpho-semantics parser for French (DériF)
based on linguistic constraints. Words decomposition is recursive,
hierarchical and any complex input (be it a neologism or an attested word)
is provided a pseudo-definition wrt to the morphological process which
relates it to its base.
Derif is developing one morphological process type (= module : for instance
noun-to-adjective -ique suffixation is a module) after the other, so that
it does not account for all morphological processes yet; so far,
it is able to parse around 30 word formation types, including suffixation,
prefixation rules, conversion and neoclassical compounding.
It is a simple Perl program, that requires only to have Perl 5.8 installed.
DeriF recent developments focus on biomedical terminology. Last DériF
version allows neoclassical compounds to be grouped into lexical classes,
by calculating synonymy, hyponymy and approximation relations.
Here is an example:
[ [ gastr N* ] [ algie N* ] NOM ]
" douleur (du -- liée au) estomac "
Constituants = /gastr/algie/
gastralgie/NOM: synonym of gastrodynie/NOM, stomacalgie/NOM,
stomacodynie/NOM, stomachodynie/NOM, (gastralgique/ADJ)
gastralgie/NOM: subtype of abdominalgie/NOM
gastralgie/NOM: see also entéralgie/NOM,
More details in
Unfortunately, it is still too soon to deliver a version of DeriF because
results have still to be validated.
As soon as results for medical terminology are validated (i.e. in a few
months, at the end of the French national UMLF project, coordinated by P.
Zweigenbaum and supported by grants from the French Ministry of
Education), they will be made freely available for the scientific community
At 08:17 27/01/2005 -0600, John A Goldsmith a écrit:
>In connection with the Linguistica project
>.edu/alchemist.html ), we are in the process of building gold-standards
>of morphological segmentation in a common XML format for a number of
>languages. Our concern is more with morphological segmentation (and
>allomorphy) and less with tagging of morphosyntactic features.
>I would very much appreciate pointers to any lists of words, in any
>language, with an indication of correct morphological segmentation, or
>pointers to software that does a good job of accomplishing this in
>Some morphological parsers focus on providing lemmatization or
>morphosyntactic features, like Namer s FLEMM mentioned by Jean Véronis, as
>far as I can tell; these do not help us with our task. In addition, since
>our goal is to use these gold standards for testing, rather than for
>training, accuracy is particularly important.
>I ll post a summary of all responses I receive. Thanks very much!
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Corpora-archive