which is a successful effort to do something that Joel Tetreault asked me to do when I was in his team at Nuance. Comparisons like this are great fun to work on. One point that is worth really emphasizing is that there are lot of dimensions of performance and usability for systems of this complexity. For one application you might particularly want speed, for another every drop of accuracy that can possibly be had. For another your choice might be determined license terms, by cost, or by compatibility with your tools and infrastructure.
And of course there are new parsers every week...
On 13 May 2016 at 18:51, Koos Wilt <kooswilt at gmail.com> wrote:
> I wrote an overview of the performance of parsers about 4 years ago. Would
> sending it somewhere (e.g. to Mr Brew) be helpful to anyone? It's on my
> other laptop so I have to dig for it.
> 2016-05-13 19:30 GMT+02:00 chris brew <cbrew at acm.org>:
>> It is an unarguable fact that Google's parser gets a higher score, on the
>> metrics chosen, which are completely standard in the NLP community. What is
>> really being measured is what percentage of the links in a graph that links
>> words to words via labeled links are correct. If, as is common, there are
>> many words in the sentence, there will be many links too, and many
>> opportunities for mistakes. You could get a 90% score and still have a
>> mistake or two in nearly every sentence.
>> Whether this quality level is OK depends entirely on what use you plan to
>> make of the graph that has been produced.
>> The Penn Treebank was made many years ago, with version 2 coming out in
>> 1995. We have learnt a lot about how to annotate corpora and evaluate
>> parsing since then. The Web Treebank is much newer, and reflects painfully
>> learned best practices, so should be good quality, but is on the other hand
>> dealing with much messier language, so performance scores are lower.
>> The current practice of evaluating individual dependencies was introduced
>> as a result of major deficiencies in the first evaluation metrics that were
>> used. It has the major plus of being transparent and straightforward. I
>> believe that improvements in the metric will usually translate into
>> improvements for downstream tasks that use parsing as inputs, and I wasn't
>> so sure using earlier metrics. This is progress, but quite modest progress.
>> On 13 May 2016 at 12:55, Darren Cook <darren at dcook.org> wrote:
>>> Google have trained a neural net (part of publicizing their open-source
>>> TensorFlow framework?) to parse syntax, claiming it is the world's best:
>>> I just wanted to quote this bit, on performance: (they've called in
>>> Parsey McParseface)
>>> "Parsey McParseface recovers individual dependencies between words
>>> with over 94% accuracy, ... While there are no explicit studies in the
>>> literature about human performance, we know from our in-house annotation
>>> projects that linguists trained for this task agree in 96-97% of the
>>> cases ... Sentences drawn from the web are a lot harder to analyze,
>>> ...[it] achieves just over 90% of parse accuracy on this dataset. "
>>> Are there really no studies of human performance?! Surely some professor
>>> has hinted to their PhD students that it is a nice bit of relatively
>>> easy linguistics research, that should also get them cited a lot...
>>> (I was mainly curious what the human performance gap between Penn
>>> Treebank and Google WebTreebank would be; if it would be more or less
>>> than the 4% gap for the deep learning algorithm.)
>>> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
>>> Corpora mailing list
>>> Corpora at uib.no
>> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
>> Corpora mailing list
>> Corpora at uib.no
-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 6004 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20160513/8c9bb590/attachment.txt>