Thank you, John, for bringing this to our attention. We were of course not trying to suggest that statistical NLP started in a vacuum. Ideas about statistical analysis of language have been floating around for a long time, and the examples you gave are a great evidence for that. We were mostly referring to NLP in its "current form" -- modern NLP and its modern use of statistical analysis. We should have been more careful with the way we phrased it. Best, Shay

> Some historical points:
> 1. Statistical methods for content analysis were pioneered by
> Laswell (1948) and Berelson (1952), and they were computerized
> as soon as computers became widely available. For references,
> see http://en.wikipedia.org/wiki/Content_analysis
> 2. Charles C. Fries pioneered the use of corpora in language
> analysis from the 1920s to the 1950s. For references, see
> http://clu.uni.no/icame/ij34/Fries.pdf
> 3. As early as 1947, Warren Weaver recognized the potential
> for computers in machine translation. He was instrumental
> in getting funding for it. He was also the coauthor with
> Claude Shannon of _The Mathematical Theory of Communication_
> (1949). That book stimulated a considerable body of research
> in the application of statistical methods to language analysis.
> 4. Chomsky's thesis adviser, Zellig Harris, pioneered transformational
> methods. Unlike Chomsky, Harris emphasized the use of corpora and
> statistics. See the collection, _The Legacy of Zellig Harris_:
> http://hum.uchicago.edu/jagoldsm/Papers/ZelligHarrisLgProofs.pdf
> 5. Victor Yngve, a pioneer in MT, was also a pioneer in using
> statistics in language analysis. Hutchins summarizes both
> in http://aclweb.org/anthology/J/J12/J12-3001.pdf
> 6. As the director of the MT project at MIT, Yngve hired Chomsky as
> a promising young PhD whose syntactic methods might be useful.
> Chomsky also taught a course in linguistics and published his
> notes as _Syntactic Structures_ (1957). In that book, Chomsky
> strongly rejected statistical methods and the use of corpora.
> 7. In the 1980s, Fred Jelinek used statistical methods for a project
> on speech recognition at IBM Research. John Cocke suggested that
> similar methods might be useful for MT. In those days, they
> swamped the capacity of the largest IBM mainframes. By the 1990s,
> they could run on minicomputers and workstations.
> John
