From the abstract at the pointy end of this pointer:
> The exercise identifies the state-of-the-art for fine-grained word sense
> disambiguation, where training data is available, as 74–78% correct, with
> a number of algorithms approaching this level of performance. For systems
> that did not assume the availability of training data, performance was
> markedly lower and also more variable. Human inter-tagger agreement was
> high, with the gold standard taggings being around 95% replicable.
Implication: For a 300-word page of text, a state-of-the-art program would have about 75 errors. That would be an average of two errors for 8-word sentences, or five errors for 20-word sentences.
For the "gold" standard, there would still be 15 errors in a 300-word page. Miss Elliott, my high-school English teacher, wouldn't give anyone a gold star for 15 errors per page.