[Corpora-List] corpus syntax (and how we can use it to code meaning)

Eric Atwell eric at comp.leeds.ac.uk
Tue Sep 18 10:27:41 CEST 2007


On Tue, 18 Sep 2007, Rob Freeman wrote:


> I want to summarize some of the more practical aspects of those solutions.

thanks for this summary; now I won't have to re-read the whole thread to remind myself what's been discussed :-)


> ..., we might use the context about a word or phrase to
> select, ad-hoc, a class of words or phrases with are similar to that word or
> phrase (in that context.) ... we can use these true/not
> true distinctions to select both syntax, and meaning, specific to context,
> in ways we have not been able up to now.

This suggests that corpus linguists should be interested in clustering or unsupervised machine learning of words into classes according to shared contexts; but they have been investigating this for some time, see e.g. papers in Proceedings of ICAME'86, EACL'87. The main difference between then and now is compute power: we can now use more sophisticated clustering algorithms, and cluster according to more complex context patterns, e.g. Roberts et al in Corpora, vol. 1, pp. 39-57. 2006.

But my impression is that most Corpus Linguists are not really that interested in unsupervised Machine Learning, i.e. letting the computer work out the grammar/semantics "from scratch"; they prefer to examine and analyse the corpus data "by hand" to select examples to back up their own theories...

Eric Atwell, Leeds University WWW/email: google Eric Atwell



More information about the Corpora mailing list