[Corpora-List] Parallel Word Lists

David L. Hoover david.hoover at nyu.edu
Mon Oct 19 17:22:44 CEST 2009


I often need what I'll call a parallel word list, which is a combined word frequency list for a corpus of texts along with an entry for the frequency of each word in each text, including zero frequencies, like this (the entries are in descending frequency order for the entire corpus):

Text 1 Text 2 Text 3 the 0.0610 0.0428 0.0551 and 0.0387 0.0294 0.0249 to 0.0265 0.0287 0.0272 of 0.0252 0.0291 0.0326 a 0.0239 0.0238 0.0207 city 0.0000 0.0015 0.0002

I have my own methods of doing this, and I know that WordSmith Tools will produce such a list using the "Detailed Consistency List" function, with View Column Totals, but I wonder if there are especially good publicly available (free) methods out there that I just haven't found.

Also, to be clear, I'm looking for a simple tool for users without any programming experience, so no Perl scripts, no UNIX, etc.

Thanks, David Hoover

--

David L. Hoover, Professor of English, NYU

212-998-8832 http://homepages.nyu.edu/~dh3/

Most of her friends had an anxious, haggard look, . . . Basil Ransom wondered who they all were; he had a general idea they were mediums, communists, vegetarians.

-- Henry James, The Bostonians (1886)



More information about the Corpora mailing list