[Corpora-List] HiDEx version 0.03 released along with a sample vector set and word list.

Cyrus Shaoul cyrus.shaoul at ualberta.ca
Fri May 21 00:01:29 CEST 2010

Dear Fellow Corpora List members:

A new version of our implementation of the HAL model is now available, and for those who do not wish to process a corpus,

a new sample vector set has been released.

HiDEx is released as GPLv3 source code for Mac OS X and Unix (no Microsoft Windows version available). It is available here:


(There is a link to the documentation on this web page. Please read it before downloading the software.)

The changes since version 0.02 are:

1) Input word lists may now be in uppercase or lowercase. 2) Threshold size calculations are dynamic and are compatible with all the available similarity metrics. 3) Other minor bugs fixed.

Our new vector set was made using the Westbury Lab Wikipedia corpus (which is also available for download on our site). It has vectors for over 50,000 words, and can be used with HiDEx to calculate word-pair co-occurrence similarity, word neighborhoods and other metrics. When using the sample vectors instead of a corpus there is no way to adjust parameters such as window size, window weights or vector normalization method.

The vector set is available here.


Please make sure to download and compile HiDEx before using this vector set. (The vector set is 1Gb in size, compressed, so please use BitTorrent when downloading unless you have access to the Internet2).

Also, we have made a list of over 50,000 English words with their neighborhood densities calculated using the Wikipedia corpus. It is available at:




-- =[=]={=}=[=]={=}=[=]={=}=[=]={=}=[=]={=} Cyrus Shaoul http://www.psych.ualberta.ca/~westburylab/ University of Alberta =[=]={=}=[=]={=}=[=]={=}=[=]={=}=[=]={=}

More information about the Corpora mailing list