[Corpora-List] corpus ------>>>>> thesaurus

Dominic Widdows widdows at maya.com
Tue Nov 9 16:16:00 CET 2004



> Hi Vladimir,

>

> You can find a good introduction to lexical acquisition methods based

> on

> co-occurrence statistics in Manning and Schuetze's "Foundations of

> Statistical Natural Language Processing".


Hi Vladimir,

Just to add to Viktor's suggestion - we have a few demos of thesaurus
generation / lexical acquisition some of which are based directly on
Shuetze's work, at
http://infomap.stanford.edu/webdemo

There are a couple of fairly domain-specific models built from the
Ohsumed medical corpus and the Wall Street Journal (though the latter
has a lot of general topics as well).

You can find links to papers (including work on mapping words and
senses from corpus derived models into hand-built lexical resources)
and some software for processing corpora into vector word-association
models (using a form of latent semantic analysis) from the main site at
http://infomap.stanford.edu/

Best wishes,
Dominic






More information about the Corpora-archive mailing list