[Corpora-List] Syntactic parsing performance by humans?

Koos Wilt kooswilt at gmail.com
Fri May 13 21:37:02 CEST 2016

OK, this, at least, is conform to the original request: "Parsing, comparisons between parsers, and other comparative studies, where components are viewed as modular entities in an entire sysrtem, are the subject of [2, 3, 9, 23, 25, 28, 32, 33]. " The numbers refer to my bibliography. The bibliography was compiled for the Wageningen University Department of Plant Sciences for Ms Judith Risse, PhD candidate under Prof Jack Leunissen.



2016-05-13 21:31 GMT+02:00 Koos Wilt <kooswilt at gmail.com>:

> It is not so much an overview of parser performance as it is a
> bibliography. Many of the entries, however, do contain parser evaluations.
> Hope it's still useful.
> -K
> 2016-05-13 20:26 GMT+02:00 Koos Wilt <kooswilt at gmail.com>:
>> Also please allow me to give a plug for the Stanford Parser. I cannot
>> claim if performs worse or better than Google's, but it's become my trusty
>> of war-horse.
>> 2016-05-13 20:24 GMT+02:00 Koos Wilt <kooswilt at gmail.com>:
>>> Bob Berwick
>>> 20:13 (10 minuten geleden)
>>> aan mij
>>> would be useful for the list and the community. please do.
>>> Koos Wilt <kooswilt at gmail.com>
>>> 20:14 (9 minuten geleden)
>>> aan Bob
>>> Coming up tomorrow or so.
>>> 2016-05-13 19:51 GMT+02:00 Koos Wilt <kooswilt at gmail.com>:
>>>> I wrote an overview of the performance of parsers about 4 years ago.
>>>> Would sending it somewhere (e.g. to Mr Brew) be helpful to anyone? It's on
>>>> my other laptop so I have to dig for it.
>>>> Best,
>>>> -K
>>>> 2016-05-13 19:30 GMT+02:00 chris brew <cbrew at acm.org>:
>>>>> It is an unarguable fact that Google's parser gets a higher score, on
>>>>> the metrics chosen, which are completely standard in the NLP community.
>>>>> What is really being measured is what percentage of the links in a graph
>>>>> that links words to words via labeled links are correct. If, as is common,
>>>>> there are many words in the sentence, there will be many links too, and
>>>>> many opportunities for mistakes. You could get a 90% score and still have a
>>>>> mistake or two in nearly every sentence.
>>>>> Whether this quality level is OK depends entirely on what use you plan
>>>>> to make of the graph that has been produced.
>>>>> The Penn Treebank was made many years ago, with version 2 coming out
>>>>> in 1995. We have learnt a lot about how to annotate corpora and evaluate
>>>>> parsing since then. The Web Treebank is much newer, and reflects painfully
>>>>> learned best practices, so should be good quality, but is on the other hand
>>>>> dealing with much messier language, so performance scores are lower.
>>>>> The current practice of evaluating individual dependencies was
>>>>> introduced as a result of major deficiencies in the first evaluation
>>>>> metrics that were used. It has the major plus of being transparent and
>>>>> straightforward. I believe that improvements in the metric will usually
>>>>> translate into improvements for downstream tasks that use parsing as
>>>>> inputs, and I wasn't so sure using earlier metrics. This is progress, but
>>>>> quite modest progress.
>>>>> On 13 May 2016 at 12:55, Darren Cook <darren at dcook.org> wrote:
>>>>>> Google have trained a neural net (part of publicizing their
>>>>>> open-source
>>>>>> TensorFlow framework?) to parse syntax, claiming it is the world's
>>>>>> best:
>>>>>> http://googleresearch.blogspot.co.uk/2016/05/announcing-syntaxnet-worlds-most.html
>>>>>> I just wanted to quote this bit, on performance: (they've called in
>>>>>> Parsey McParseface)
>>>>>> "Parsey McParseface recovers individual dependencies between words
>>>>>> with over 94% accuracy, ... While there are no explicit studies in the
>>>>>> literature about human performance, we know from our in-house
>>>>>> annotation
>>>>>> projects that linguists trained for this task agree in 96-97% of the
>>>>>> cases ... Sentences drawn from the web are a lot harder to analyze,
>>>>>> ...[it] achieves just over 90% of parse accuracy on this dataset. "
>>>>>> Are there really no studies of human performance?! Surely some
>>>>>> professor
>>>>>> has hinted to their PhD students that it is a nice bit of relatively
>>>>>> easy linguistics research, that should also get them cited a lot...
>>>>>> (I was mainly curious what the human performance gap between Penn
>>>>>> Treebank and Google WebTreebank would be; if it would be more or less
>>>>>> than the 4% gap for the deep learning algorithm.)
>>>>>> Darren
>>>>>> _______________________________________________
>>>>>> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
>>>>>> Corpora mailing list
>>>>>> Corpora at uib.no
>>>>>> http://mailman.uib.no/listinfo/corpora
>>>>> _______________________________________________
>>>>> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
>>>>> Corpora mailing list
>>>>> Corpora at uib.no
>>>>> http://mailman.uib.no/listinfo/corpora
-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 11291 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20160513/ed32bcfb/attachment.txt>

More information about the Corpora mailing list