[Corpora-List] Question about Named Entity Annotation

Saeedeh Momtazi saeedeh.momtazi at gmail.com
Wed Jul 26 06:49:06 CEST 2017


Dear All,

We are working on a Persian NER project. Along with the project we are annotating a Persian corpus with NE tags. The tagging scheme have around 10 different tags including job, religious, and nationality.

We are not sure about the correct NE tags for the words that refer to a group of people. Either based on their job, their religious, or their nationality.

For example in these sentences: *Teachers* are working in this institute. Muhammad is the profit of *Muslims*. The *Mongols* are bound together by a common heritage and ethnic identity. The *Achaemenid Empire* was an empire based in Western Asia, founded by Cyrus the Great.

Here are the questions:

1- Are all the above examples NEs? I am pretty sure about items 2, 3 and 4 due to start with a capital letter. But what about the item 1? The word "teachers" does not start with a capital letter normally! Should this word that refers to a group of people be considered as a NE?

2- As far as the NE tag concerns, which tag should be used for the above examples? "person"/"group of people" or other NE types? For example can we consider teachers as a "job" in this text? or Muslims as a "religious"? or Achaemenid Empire as a "date", since it refers to a special period of time.

3- In case they should be tagged as "person", does it make sense to define a separate tag for "group of people"?

Thanks in advance for you help.

Best Regards, Saeedeh

*Saeedeh Momtazi, PhD*

*Assistant Professor*

*Computer Engineering and Information Technology Department*

*Amirkabir University of Technology*

*Tehran, Iran*

*Tel: +98 21 64542729*

*momtazi at aut.ac.ir <momtazi at aut.ac.ir>*

*http://ceit.aut.ac.ir/~momtazi <http://ceit.aut.ac.ir/~momtazi>* -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 3924 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20170726/c7bb7acd/attachment.txt>



More information about the Corpora mailing list