[Corpora-List] english lexicon

Brierley, Claire C.Brierley
Fri Apr 3 23:36:59 CEST 2009


On Thu, 2 Apr 2009, Tine Lassen wrote:
> I am looking for a - preferably - freely available lexicon of English words and their inflectional forms.

Hello Tine,

You might also like to look at ProPOSEL, a prosody and part-of-speech English lexicon, which comes as a textfile of 104,049 word forms (including separate entries for inflected forms) and which merges information from CELEX-2, CUV2/CUVPlus and CMU, the Carnegie-Mellon Pronouncing Dictionary.

In ProPOSEL, each word form is mapped to four variant PoS-tagging schemes (C5; Penn Treebank; LOB; C7); default closed and open-class word categories; canonical phonetic transcriptions (SAM-PA and DISC); syllable counts; consonant-vowel (CV) patterns; and lexical stress patterns i.e. abstract representations of rhythmic structure. So for example, a selection of fields {word; C5 tag; lexical stress pattern; Penn Treebank tag; default content-function word tag; LOB tag; C7 tag; and DISC phonetic transcription mapped to stress weightings} for secure looks like this:

secure|VVI|01|VB|C|VB|VVI|sI:0 'kj9R:1 secure|AJ0|01|JJ|C|JJ,JJB,JNP|JJ,JK|sI:0 'kj9R:1 secures|VVZ|01|VBZ|C|VBZ|VVZ|sI:0 'kj9z:1 secured|VVD|01|VBD|C|VBD|VVD|sI:0 'kj9d:1

There is a paper available here: http://www.lrec-conf.org/proceedings/lrec2008/summaries/724.html <http://www.lrec-conf.org/proceedings/lrec2008/summaries/724.html>

For further information, just contact me.

Claire Brierley Games Computing and Creative Technologies University of Bolton, UK <http://www.lrec-conf.org/proceedings/lrec2008/summaries/724.html>



More information about the Corpora mailing list