[Corpora-List] Is a complete grammar possible (beyond thecorpus itself)?
John F. Sowa
sowa at bestweb.net
Tue Sep 11 03:54:41 CEST 2007
Mike, Geoffrey, and Rob,
I completely agree with the following point:
MM> I wouldn't call what has been learned about language in
> the last 50 years by formal linguists a waste; I would say
> we know a *lot* more as a result.
But I also believe that some major adjustments in the goals
of formal linguistics should be made, and I believe that many
prominent formal linguists would agree -- although there is
probably no consensus on how the goals should change. See, for
example, the excerpts below by Barbara Partee and Hans Kamp.
GS> My impression is that there remain a great number of people
> out there who would describe themselves as generative linguists
> and who do assume that NLs are formal systems defined by finite
> sets of clearcut rules... Am I out of date in supposing that a
> large proportion of linguists remain generativists in that sense?
I don't know the percentage, but when Partee and Kamp raise serious
doubts, it's hard for their followers to keep the faith.
When I claimed that natural languages are *not* formal languages,
I was not denying the following points, which I still believe:
1. Generative grammars (or generative systems of any kind) can
be very useful. But they are not going to satisfy Chomsky's
original criterion: generate all and only those sentences
of an NL that most native speakers would consider grammatical.
Perhaps a formal grammar combined with various statistical,
semantic, contextual, and pragmatic considerations may someday
produce quite "natural-like" systems -- but that would be very
far from Chomsky's original conception of a formal grammar
clearly separated from the semantics and pragmatics.
2. Formal techniques of many different kinds can be very useful
for studying and processing NLs. For example, every program
that has ever been written for a digital computer is purely
*formal*, but programs can do lots of useful things with NLs.
3. Controlled natural languages can be very useful -- i.e.,
some formally defined languages that happen to use some
subset of the syntax and vocabulary of a natural language.
Many such languages have been defined and used for many
purposes, and more uses for them will probably be found.
RF> I'm just saying we should at least explore the possibility
> formal grammars are "necessarily incomplete" descriptions of
> corpora, that the right way to handle language is to generalize
> grammar ad-hoc from examples, as you go.
In fact, that is what many, if not most, grammar and parser
developers have been doing for the past 50 years. Everybody who
is developing broad-coverage parsers starts small and generalizes
with ad-hoc examples (usually selected from one or more corpora)
until the coverage gets better and better.
Unfortunately, this process does not converge: when you switch
to a new corpus for a new genre or sublanguage, coverage drops
dramatically. As I said in my earlier note, the best parsers
available today can parse a very high percentage of the sentences
in several different genres. But if you check what percentage
of the sentences are correctly parsed (according to the standards
of a good linguist), the answer is at best half of them -- and
for most parsers, much less than half.
However, if you delimit the genre very strictly, then you get
something very close to a controlled NL, as in point #3 above.
For a CNL, 100% coverage is possible. But then the authors
who write that language require training (and/or grammar
checkers) in order to keep them within the controlled subset.
Note that in the excerpts below, Kamp and Partee are not
rejecting formal methods completely, but they are acknowledging
that NLs are not formal. They also admit that they do not know
how to fix the current systems in order to make them adequate to
process a significant amount of normal language. That is very
different from what Chomsky and Montague claimed 40 years ago.
A Farewell to Logic in Action: Hans Kamp
First, there is the all-pervasive presence of vagueness that we find in
natural language. For the problems which vagueness, in all its different
forms, presents to the classical semantics of Montague Grammar, there
exists in my view still no fully satisfactory solution.
Secondly, as we have known now for more than two decades, the semantic
form of natural language sentences is wedded to their use in discourse,
where they serve to update information. It is rare for a sentence of
natural language to express the content it is meant to transmit without
remainder. Rather, the sentences that are part of a text or conversation
rely on information that may be assumed to be in the addressee's
possession already, and they are equipped with various kinds of pointers
towards bits of that information. This "dynamic" dimension to natural
language is also something that the formal languages which served
Montague as examples do not possess. This insight has lead to a quite
profound change in our understanding of how the semantics of natural
language works. As a consequence, our current conception of the
architecture of natural language semantics is importantly different
from that of Montague Grammar, technically as well as conceptually.
Lecture 4. Formal semantics and the lexicon, Barbara Partee
In Montague’s formal semantics the simple predicates of the language
of intensional logic (IL), like love, like, kiss, see, etc., are
regarded as symbols (similar to the “labels” of PC) which could have
many possible interpretations in many different models, their “real
meanings” being regarded as their interpretations in the “intended
model”. Formal semantics does not pretend to give a complete
characterization of this “intended model”, neither in terms of the
model structure representing the “worlds” nor in terms of the
assignments of interpretations to the lexical constants. The present
formalizations of model-theoretic semantics are undoubtedly still
rather primitive compared to what is needed to capture many important
semantic properties of natural languages, including for example
spatial and other perceptual representations which play an important
role in many aspects of linguistic structure. The logical structure
of language is a real and important part of natural language and we
have fairly well-developed tools for describing it. There are other
approaches to semantics that are concerned with other aspects of
natural language, perhaps even cognitively “deeper” in some sense,
but which we presently lack the tools to adequately formalize. It is
to be hoped that these different approaches can be seen as complementary
and not necessarily antagonistic.
More information about the Corpora