[Corpora-List] Gender dataset

John D Burger john at mitre.org
Fri Apr 13 15:45:12 CEST 2012


kiran wrote:


> Is there any gender dataset available?
> It should ideally be a first name-gender mapping
>
> Ex: Abraham-Male or Abraham_Lincoln-Male

There are the name lists from the US 1990 Census, which have been used in a lot of language research, I believe:

http://www.census.gov/genealogy/names/

These comprise three files: male given names, female given names, and surnames, each with frequency information. From the first two files, you could construct a gender distribution for each given name.

- John Burger

MITRE



More information about the Corpora mailing list