[Corpora-List] Gender dataset

Amaç Herdağdelen amac at herdagdelen.com
Fri Apr 13 17:02:44 CEST 2012

Hi Kiran,

I compiled the 1990 Census data and US Social Security Administration's statistics for popular baby names for every year between 1960 and 2010 together:


In this repository, there are also some simple Python scripts which may help you to get started. If you want an evaluation of the name-based heuristics you can have a look at this manuscript:

http://clic.cimec.unitn.it/amac/twitter_ngram/Herdagdelen2012-RTC-draft.pdf (Section 3, page 8).

There is also an older Perl module:


by Jon Orwant and Eamon Daly, which has an option for fuzzy search -- based on phonetic similarity of the names, I believe.



More information about the Corpora mailing list