http://googleresearch.blogspot.co.uk/2016/05/announcing-syntaxnet-worlds-most.html
I just wanted to quote this bit, on performance: (they've called in Parsey McParseface)
"Parsey McParseface recovers individual dependencies between words with over 94% accuracy, ... While there are no explicit studies in the literature about human performance, we know from our in-house annotation projects that linguists trained for this task agree in 96-97% of the cases ... Sentences drawn from the web are a lot harder to analyze, ...[it] achieves just over 90% of parse accuracy on this dataset. "
Are there really no studies of human performance?! Surely some professor has hinted to their PhD students that it is a nice bit of relatively easy linguistics research, that should also get them cited a lot...
(I was mainly curious what the human performance gap between Penn Treebank and Google WebTreebank would be; if it would be more or less than the 4% gap for the deep learning algorithm.)
Darren