[Corpora-List] Chiniese Name Gender Recognition

Xiaofei Lu xflu at ling.ohio-state.edu
Thu Dec 22 20:16:01 CET 2005

Are you planning to look at context at all? The pronoun resolution idea
should definitely help. Plus, looking at the context in which a personal
name appears may help a bit, too, e.g., in cases where one or more names
appears after things like "member(s) of the women's team", etc.


On Thu, 22 Dec 2005, Heng Ji wrote:


> I believe your IR idea will boost the performance. Besides, you may want to

> try applying pronoun reference resolution before gender disambiguation.

> Since Chinese person pronouns are distinguished clearly based on genders. If

> you could link the pronoun in the context with the name candidate, that

> might help. In addition a few gender-specific title words in the context

> would be useful too.


> I would guess only using lexical information can accurately recognize name

> genders for people born before 1980; but might not be enough for those

> names appearing later - many names have been given intentionally

> gender-insensitive.:) So you may want to incorporate the time frame

> information in your system.


> Heng


> On Thu, 22 Dec 2005, Jun Lang wrote:


>> Hi Mark Lewellen,

>> Thanks for your concerning about this problem.

>> Yes. After doing some baseline research, I found there were many

>> related problems about the gender recognition based on Chinese Name. May be

>> using only Name could not achieve better result. I am considering

>> combining some other resource for disambiguation the gender. For example, I

>> could use some search engine for some gender designing word to enhance the

>> final accuracy.

>> How do you think about it?


>> Thanks!


>> May you nice Christmas Eve and Day!


>> Best wishes,

>> Bill_Lang(Jun Lang): Ph.D Candidate

>> Information Retrieval Laboratory

>> Harbin Institute of Technology

>> Mail: bill_lang at gmail.com

>> Homepage: http://ir.hit.edu.cn/~bill_lang



>> -----Original Message-----

>> From: Mark Lewellen [mailto:lewellen at erols.com]

>> Sent: Wednesday, December 21, 2005 11:49 PM

>> To: 'Jun Lang'; 'Xiaofei Lu'

>> Cc: corpora at uib.no

>> Subject: RE: [Corpora-List] Chiniese Name Gender Recognition


>> Since Chinese given names are not limited to a set of

>> lexical items that are prototypically 'names' (i.e. they

>> can be just about any lexical item), Chinese given names,

>> as you probably know, often have no clue about gender.

>> There has been some discussion on 'traits' that are

>> more feminine or masculine and would be reflected in names,

>> but there remains a lot of ambiguity. I doubt there is any

>> statistical method, algorithm, or even native speaker that

>> can make up for that problem!


>> Mark Lewellen


>>> -----Original Message-----

>>> From: owner-corpora at lists.uib.no

>>> [mailto:owner-corpora at lists.uib.no] On Behalf Of Jun Lang

>>> Sent: Tuesday, December 13, 2005 7:31 AM

>>> To: 'Xiaofei Lu'

>>> Cc: corpora at uib.no

>>> Subject: [Corpora-List] 答复: [Corpora-List] Chiniese Name

>>> Gender Recognition



>>> Yeah! There are many names which could be used for mail and

>>> female. It is a

>>> difficult problem. Now I have done some simple research on this topic.

>>> Recently, I am trying to get more and more data. Since the

>>> parameter space

>>> is very huge, decision trees can not get the final result

>>> quickly. I want to

>>> use Bayes Model again.


>>> Can you give me some ideas about it? Thanks a lot!


>>> Best wishes,

>>> Jun Lang


>>> -----邮件原件-----

>>> 发件人: Xiaofei Lu [mailto:xflu at ling.ohio-state.edu]

>>> 发送时间: 2005年12月13日 13:56

>>> 收件人: Jun Lang

>>> 主题: Re: [Corpora-List] Chiniese Name Gender Recognition


>>> Interesting. What is and how do you establish the baseline?

>>> Many names can

>>> be either male or female, can't they?


>>> On Tue, 13 Dec 2005, Jun Lang wrote:


>>>> Hi all Corpora Members,


>>>> Now I am studying on Chinese Name Gender Recognition.

>>> The input is a

>>>> Chinese name. The output is the corresponding gender. I

>>> used decision

>>> trees

>>>> method. But finally, the accuracy is only about 70%.


>>>> Do you know any other method which can achieve higher

>>> accuracy? And is

>>>> there somebody has done any similar research?


>>>> Thanks a lot!




>>>> Best wishes,


>>>> Bill_Lang(Jun Lang): Ph.D Candidate


>>>> Information Retrieval Laboratory


>>>> Harbin Institute of Technology


>>>> Mail: bill_lang at gmail.com


>>>> Homepage: http://ir.hit.edu.cn/~bill_lang









More information about the Corpora-archive mailing list