[Corpora-List] Last CFP: NAACL Workshop on Vector Space Modeling for NLP

Shay Cohen scohen at cs.cmu.edu
Sat Feb 28 14:00:10 CET 2015

Thank you, John, for bringing this to our attention. We were of course not trying to suggest that statistical NLP started in a vacuum. Ideas about statistical analysis of language have been floating around for a long time, and the examples you gave are a great evidence for that. We were mostly referring to NLP in its "current form" -- modern NLP and its modern use of statistical analysis. We should have been more careful with the way we phrased it. Best, Shay

On Thu, Feb 26, 2015 at 7:32 PM, John F Sowa <sowa at bestweb.net> wrote:

> On 2/24/2015 5:20 AM, Shay Cohen wrote:
>> NLP started with methods based on pure symbolic analysis of language.
>> Statistical methods were introduced to NLP in the 1990s,
> Some historical points:
> 1. Statistical methods for content analysis were pioneered by
> Laswell (1948) and Berelson (1952), and they were computerized
> as soon as computers became widely available. For references,
> see http://en.wikipedia.org/wiki/Content_analysis
> 2. Charles C. Fries pioneered the use of corpora in language
> analysis from the 1920s to the 1950s. For references, see
> http://clu.uni.no/icame/ij34/Fries.pdf
> 3. As early as 1947, Warren Weaver recognized the potential
> for computers in machine translation. He was instrumental
> in getting funding for it. He was also the coauthor with
> Claude Shannon of _The Mathematical Theory of Communication_
> (1949). That book stimulated a considerable body of research
> in the application of statistical methods to language analysis.
> 4. Chomsky's thesis adviser, Zellig Harris, pioneered transformational
> methods. Unlike Chomsky, Harris emphasized the use of corpora and
> statistics. See the collection, _The Legacy of Zellig Harris_:
> http://hum.uchicago.edu/jagoldsm/Papers/ZelligHarrisLgProofs.pdf
> 5. Victor Yngve, a pioneer in MT, was also a pioneer in using
> statistics in language analysis. Hutchins summarizes both
> in http://aclweb.org/anthology/J/J12/J12-3001.pdf
> 6. As the director of the MT project at MIT, Yngve hired Chomsky as
> a promising young PhD whose syntactic methods might be useful.
> Chomsky also taught a course in linguistics and published his
> notes as _Syntactic Structures_ (1957). In that book, Chomsky
> strongly rejected statistical methods and the use of corpora.
> 7. In the 1980s, Fred Jelinek used statistical methods for a project
> on speech recognition at IBM Research. John Cocke suggested that
> similar methods might be useful for MT. In those days, they
> swamped the capacity of the largest IBM mainframes. By the 1990s,
> they could run on minicomputers and workstations.
> John
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 4135 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20150228/75a4f54a/attachment.txt>

More information about the Corpora mailing list