I'm referring to data that would be usable for commercial purposes, unlike data provided through the Linguistic Data Consortium (LDC) for research purposes. The trade-off for a commercial organization is the opportunity to recapture the expense of annotating a data set against the risk of accelerating time to market, or promoting a sale at one's own expense, of a competing product or service.
My premise is that a software system's greatest value lies in what it can do with the training data rather than in the training data itself. But what considerations do others see?
-- Seth Grimes Alta Plana Corp, analytical computing & data management
Intelligent Enterprise magazine (CMP), Contributing Editor grimes at altaplana.com http://altaplana.com +1 301-270-0795