2016-05-13 19:30 GMT+02:00 chris brew <cbrew at acm.org>:
> It is an unarguable fact that Google's parser gets a higher score, on the
> metrics chosen, which are completely standard in the NLP community. What is
> really being measured is what percentage of the links in a graph that links
> words to words via labeled links are correct. If, as is common, there are
> many words in the sentence, there will be many links too, and many
> opportunities for mistakes. You could get a 90% score and still have a
> mistake or two in nearly every sentence.
> Whether this quality level is OK depends entirely on what use you plan to
> make of the graph that has been produced.
> The Penn Treebank was made many years ago, with version 2 coming out in
> 1995. We have learnt a lot about how to annotate corpora and evaluate
> parsing since then. The Web Treebank is much newer, and reflects painfully
> learned best practices, so should be good quality, but is on the other hand
> dealing with much messier language, so performance scores are lower.
> The current practice of evaluating individual dependencies was introduced
> as a result of major deficiencies in the first evaluation metrics that were
> used. It has the major plus of being transparent and straightforward. I
> believe that improvements in the metric will usually translate into
> improvements for downstream tasks that use parsing as inputs, and I wasn't
> so sure using earlier metrics. This is progress, but quite modest progress.
> On 13 May 2016 at 12:55, Darren Cook <darren at dcook.org> wrote:
>> Google have trained a neural net (part of publicizing their open-source
>> TensorFlow framework?) to parse syntax, claiming it is the world's best:
>> I just wanted to quote this bit, on performance: (they've called in
>> Parsey McParseface)
>> "Parsey McParseface recovers individual dependencies between words
>> with over 94% accuracy, ... While there are no explicit studies in the
>> literature about human performance, we know from our in-house annotation
>> projects that linguists trained for this task agree in 96-97% of the
>> cases ... Sentences drawn from the web are a lot harder to analyze,
>> ...[it] achieves just over 90% of parse accuracy on this dataset. "
>> Are there really no studies of human performance?! Surely some professor
>> has hinted to their PhD students that it is a nice bit of relatively
>> easy linguistics research, that should also get them cited a lot...
>> (I was mainly curious what the human performance gap between Penn
>> Treebank and Google WebTreebank would be; if it would be more or less
>> than the 4% gap for the deep learning algorithm.)
>> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
>> Corpora mailing list
>> Corpora at uib.no
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 4745 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20160513/362c3c74/attachment.txt>