[Corpora-List] Survey: applications using grammar-based parsers

Michael Piotrowski mxp
Wed Apr 1 00:08:04 CEST 2009

On 2009-03-31, Trond Trosterud <trond.trosterud at uit.no> wrote:

> Here comes a summary of the answers to the query.

Thank you for summarizing. Your query caught my interest since we may have a need for an embeddable parser in our project on linguistically supported editing <http://www.lingured.info/>.

Unfortunately, it seems that when you do not just want the performance numbers (cited in papers) but the actual, working system, it frequently turns out that it is not available (dead project, results locked away, commercial, etc.) or unusable in an application (too slow, not embeddable, etc.). At least our experience wrt. morphologic analysis and generation for German has been quite sobering.


> Two of the feedbacks refer to commercial systems (langos, the
> rulebased MT systems, one reviewer also referred to the commercial CG-
> based company Connexor). Whereas being commercial is in itself a
> strong indication of good results (customers will not accept
> malfunction),

I think this is pretty optimistic. I don't know anything about the systems you mentioned, and I don't want to discredit anybody, but in general, being commercial is only a strong indication of marketing and business skills. It does not say anything about quality, neither good nor bad. One has to keep in mind that the average customer (including corporate custemers) is very tolerant wrt. quality problems in software, especially in markets with little competition.

> it also makes it hard to evaluate them: For commercial reasons, their
> source code, or even in some cases the (methodology behind their)
> approach, is kept confidential. Nothing more can thus be said about
> them here.

And this is a big problem. My experience is that these trade-secret approaches are rarely unique. Quite often, it's just too messy to show to anybody...

> When I look at the other three parsers, GTA, WCDG, and Link grammar, I
> find that they all bear some reseamblances to the CG framework: The
> parsing is based upon bottom-up local relations (looking at the
> relations the words may have to each other), and they are thus always
> able to come up with an an analysis.


> These frameworks are missing in my survey, as I also suspected. What I
> had expected was to see some LFG and HPSG version of iCALL programs,
> as the language in pedagogical QA systems may be restricted, thereby
> conpensating for weaker results for unbounded text, but then, these
> parsers would have been excluded by my first criterion. In order to
> analyse unbounded text reliably it thus seems that a framework with
> the properties of the 4 approaches referred to here is needed. That
> fst systems are successful for morphology but not for syntax I see as
> a healthy reminder of the difference between these two domains.

Speaking of frameworks, I might mention the Malaga system <http://home.arcor.de/bjoern-beutel/malaga/>. Unfortunately, no larger syntax grammar is publicly available, but it provides a uniform framework for morphology and syntax, and it can easily be embedded into applications.

Greetings from Switzerland

-- Michael Piotrowski, M.A. <mxp at cl.uzh.ch> Institute of Computational Linguistics, University of Zurich Phone +41 44 63-54394 | OpenPGP public key ID 0x1614A044

More information about the Corpora mailing list