[Corpora-List] English-UNL Dictionary

Ronaldo Martins r.martins at undlfoundation.org
Tue Apr 20 14:54:56 CEST 2010


(Please distribute, and apologies for multiple postings)

The UNDL Foundation has released a new version of the English-UNL dictionary. The English-UNL dictionary is a bidirectional (EN>UNL, UNL>EN) machine-tractable lexical database comprising more than 200,000 mappings between English and UNL. It brings extensive information about lexical items of English, including morphological structure, inflectional paradigms and subcategorization frames, as well as semantic information about UNL entries. The dictionary is available under an Attribution Share Alike (CC-BY-SA) Creative Commons license at the UNLarium (http://www.unlweb.net/unlarium).

============================== How the English-UNL dictionary was created? ============================== The English-UNL dictionary was mainly derived from a word list extracted from the English WordNet 3.0, which was automatically analyzed and humanly revised for lexical categories, lexical structure (roots, affixes), part of speech, number (singular, plural, singulare tantum, plurale tantum, invariant), valence, transitivity, inflectional paradigms (for nouns and verbs) and subcategorization frames (according to the X-bar theory). English entries were mapped onto entries of the UNL dictionary (i.e., UWs) and may be freely exported in two different formats: generative, containing only base forms and the corresponding generation (inflectional and composition) rules; and enumerative, containing word forms and lexical features. A sample of entries is presented below.

base form [foot] {2883} "100284665" (POS=NOU, MOR=STE, LST=WRD, NUM=SNG, INF=M1, FLX(PLR:="feet";)) <eng,0,0>;

word forms [foot] {2883} "100284665" (POS=NOU, MOR=WFO, LST=WRD, NUM=SNG, INF=M1) <eng,0,0>; [feet] {2883} "100284665" (POS=NOU, MOR=WFO, LST=WRD, NUM=PLR, INF=M1) <eng,0,0>;

The English-UNL dictionary is generated in real time according to the specifications and to the tagset described at the UNLwiki (http://www.unlweb.net/wiki). As an ongoing project and a dynamic database, the dictionary is subject to permanent augmentation and improvement, and reports on problems and other contributions are mostly welcome.

============================== Further information ============================== For further information, please contact

Ronaldo MARTINS (mailto:r.martins at undlfoundation.org) Language Resources Manager UNDL Foundation 48, route de Chancy CH-1213 - Geneva - Switzerland +41 22 879 8090

============================== What is UNL? ============================== The UNL is an artificial language that has been used for several different tasks in natural language processing, such as machine translation, multilingual document generation, summarization, information retrieval and semantic reasoning. It has been originally proposed by the Institute of Advanced Studies of the United Nations University, in Tokyo, and has been currently promoted by the UNDL Foundation, in Geneva, Switzerland, under a mandate of the United Nations. [read more about UNL in http://www.unlweb.net]

============================== The UNDL Foundation ============================== The UNDL Foundation (http://www.undlfoundation.org) is a non-profit organization based in Geneva, Switzerland, which has received, from the United Nations, the mandate for implementing the Universal Networking Language (UNL). The UNL Programme is a collaborative effort to create natural language resources and technology to reduce language barriers and strengthen cross-cultural communication in the framework of the United Nations. Participation in the Programme is free and open to individuals and institutions, either as researchers or as developers. Special funds are available for some languages.



More information about the Corpora mailing list