I think you may misunderstand what machine learning can do---though of course it all depends on what you mean by learning/generalizing from the "same context" Modesty normally forbids me citing, say, http:// citeseer.ist.psu.edu/stevenson01interaction.html, where Stevenson and I combined learners for word-sense disambiguation over quite a large corpus (The Red Badge of Courage), and one way of interpreting what the learner was doing (and it is something some would find distasteful) is that it was learning for-each-sense-of each-content -word what were the contexts and criteria that would disambiguate it-----there are other bits of contemporary and later work that could also be described that way (and this was not at all simple unsupervised learning, either). Best Yorick
PS On the on-going meta-issues, I fear the last paragraph of Eric Atwell's message is very insightful as to what is really going on here, under cloaks of private fights", "abstract discussions" "separate lists" etc.:
"But my impression is that most Corpus Linguists are not really that interested in unsupervised Machine Learning, i.e. letting the computer work out the grammar/semantics "from scratch"; they prefer to examine and analyse the corpus data "by hand" to select examples to back up their own theories..."
I have a hunch most Corpus Linguists are not interested much in computation in general, except as a secretarial/editing/retrieval tool, but they have to pay lip service to it. Paradoxically, I think, it is CL/NLP researchers who actually "trust the text", in they are experimenters who, by definition, dont know what the results of computation/ experiment will be. Many Corpus Linguists, I suspect, and there are honourable exceptions, know exactly where they are going and are as dependent on intuition and judgement as much as Chomskyans, who they still affect to criticize, and for reasons not all together clear to me. I have an on-going struggle with a distinguished lexicographer friend and colleague, who uses sophisticated KWIC indices to display contexts of a word, which he then classifies by intuition. Suggestions as to how this last stage could be automated, and I have made many over the years, are never well received and I have stopped.
On 18 Sep 2007, at 11:31, Rob Freeman wrote:
> On 9/18/07, Eric Atwell <eric at comp.leeds.ac.uk> wrote:
> On Tue, 18 Sep 2007, Rob Freeman wrote:
> > ..., we might use the context about a word or phrase to
> > select, ad-hoc, a class of words or phrases with are similar to
> that word or
> > phrase (in that context.) ... we can use these true/not
> > true distinctions to select both syntax, and meaning, specific to
> > in ways we have not been able up to now.
> This suggests that corpus linguists should be interested in clustering
> or unsupervised machine learning of words into classes according to
> shared contexts; but they have been investigating this for some time,
> see e.g. papers in Proceedings of ICAME'86, EACL'87.
> The main difference between then and now is compute power: we can now
> use more sophisticated clustering algorithms, and cluster according to
> more complex context patterns, e.g. Roberts et al in Corpora, vol. 1,
> pp. 39-57. 2006.
> Yes, people have been clustering words into classes according to
> shared contexts for some time.
> The point here is the idea that they need to cluster them into a
> different class for each context in which they occur.
> It is the goals of machine learning which I am suggesting need to
> change (viz. a complete grammar), not the methods.
> I think computational linguistics will get good results as soon as
> it stops looking for global generalizations and clusters ad-hoc,
> according to context.
> But my impression is that most Corpus Linguists are not really that
> interested in unsupervised Machine Learning, i.e. letting the computer
> work out the grammar/semantics "from scratch"; they prefer to
> examine and
> analyse the corpus data "by hand" to select examples to back up their
> own theories...
> Whether they are working "by hand" or not, people are not used to
> thinking of syntax as ad-hoc generalization according to shared
> contexts. I'm suggesting this idea needs to be taken out of machine
> learning (where it has only been seen as a means to find "grammar"
> anyway, and not a principle of syntax in its own right) and given a
> broader airing as a principle of syntax on it own merits.
> It might explain why MWE's tend to have the same "slot fillers" for
> instance. Detailed analyses of what slot fillers can occur in a
> given MWE could be done on the basis of what other contexts two
> words share and do not share.
> Corpus analysis currently tends to be done in terms of lexicon,
> what units are repeated, how often. Corpus style syntactic analyses
> could be done on the basis of what words share what contexts, and
> how well this predicts the range of combinations they participate
> in, how MWE's change over time etc.
> Corpora mailing list
> Corpora at uib.no
-------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.uib.no/mailman/public/corpora/attachments/20070918/8d50fb4c/attachment.html