For more examples, Google the phrase "damn lies and benchmarks".
Vendors who design new hardware and software have been advertising their performance on "benchmarks" for over half a century. But customers have learned that benchmarks don't answer the question whether a particular system will solve their problem.
NLP is primarily a branch of engineering. The challenge for an engineer is not to get a better score on some "metric", but to solve a client's problem within the constraints of budgets, deadlines, and available resources.
John