I have tried to send this one to the list:
Jonathan Doyle and Vlado Keselj, Automatic Categorization of Author Gender via N-Gram Analysis. In The 6th Symposium on Natural Language Processing, SNLP'2005, Chiang Rai, Thailand, December 2005.
but the message is larger, than 40K so it is waiting for moderator's approval.
On Thu, 4 Nov 2010, Rada Mihalcea wrote:
> There are several papers that looked at automatic gender identification,
> see for instance:
> M.Koppel, S. Argamon and A. Shimoni (2003), Automatically categorizing
> written texts by author gender, Literary and Linguistic Computing 17(4),
> November 2002, pp. 401-412
> Hugo Liu and Rada Mihalcea, Of Men, Women, and Computers: Data-Driven
> Gender Modeling for Improved User Interfaces, in Proceedings of the
> International Conference on Weblogs and Social Media (ICWSM), Boulder,
> Colorado, March 2007.
> Arjun Mukherjee and Bing Liu. "Improving Gender Classification of Blog
> Authors." Proceedings of Conference on Empirical Methods in Natural
> Language Processing (EMNLP-10). Oct. 9-11, 2010, Boston, Massachusetts,
> A search for "gender classification" or "gender identification" will
> most likely reveal quite a few more papers.
> On Thu, 4 Nov 2010, Diana Maynard wrote:
> >I wondered that as well.
> >On another note, I guess the success of it depends critically on at
> >least two things:
> >(1) how good the gender guesser is (I didn't see any statistics on that,
> >but I didn't search extensively).
> >(2) (which is related) - the proportion of American names in the twitter
> >corpus (since I think the guesser used is based solely on American first
> >names) - and this could have some impact. Even the differences between
> >first name gender in the US and Britain are not insignificant.
> >On a related note, has anyone done the reverse and used vocabulary
> >selection to help identify the gender of the speaker, with any success?
> >I'm sure people must have played with this idea.
> >I'm interested in techniques to improve person gender recognition - in
> >my experience, using pre-built lists of male and female names and simple
> >frequency information is often not accurate enough. Again, I haven't
> >searched extensively for this, but if anyone happens to know offhand
> >about it I'd be interested.
> >On 04/11/2010 09:51, Adam Kilgarriff wrote:
> >> Cool!
> >> So, what is it about 3? (see
> >> http://labs.buradayiz.webfactional.com/gender/query/query?words=1+2+3+4+5+6+7+8+9)
> >> You must have a theory
> >> adam
> >Corpora mailing list
> >Corpora at uib.no
> Corpora mailing list
> Corpora at uib.no