[Corpora-List] Gender dataset

Amaç Herdağdelen amac at herdagdelen.com
Fri Apr 13 17:02:44 CEST 2012


Hi Kiran,

I compiled the 1990 Census data and US Social Security Administration's statistics for popular baby names for every year between 1960 and 2010 together:

https://github.com/amacinho/Name-Gender-Guesser

In this repository, there are also some simple Python scripts which may help you to get started. If you want an evaluation of the name-based heuristics you can have a look at this manuscript:

http://clic.cimec.unitn.it/amac/twitter_ngram/Herdagdelen2012-RTC-draft.pdf (Section 3, page 8).

There is also an older Perl module:

http://search.cpan.org/~edaly/Text-GenderFromName-0.32/GenderFromName.pm

by Jon Orwant and Eamon Daly, which has an option for fuzzy search -- based on phonetic similarity of the names, I believe.

Best,

Amaš



More information about the Corpora mailing list