[Corpora-List] Corpora, annotation schema/guidelines and NER systems for LOCATION or geonames

Amir Zeldes Amir.Zeldes at georgetown.edu
Sat Nov 6 20:12:30 CET 2021


Hi Salvador,

The English GUM corpus has nested, named and non-named entity recognition for all referring expressions across 12 spoken and written genres, including places. It covers proper and common noun mentions, as well as pronouns with coreference resolution, and named entity linking to Wikipedia, as described in this paper:

https://aclanthology.org/2021.law-1.18/

The corpus and more information can be found here:

https://corpling.uis.georgetown.edu/gum/

Best,

Amir

------------

Dr. Amir Zeldes

Assoc. Prof. of Computational Linguistics

Department of Linguistics

Georgetown University

1437 37th St. NW

Washington, DC 20057

<https://corpling.uis.georgetown.edu/amir> https://corpling.uis.georgetown.edu/amir

From: corpora-bounces at uib.no <corpora-bounces at uib.no> On Behalf Of Salvador Lima Sent: Thursday, October 28, 2021 11:54 AM To: corpora at uib.no Subject: [Corpora-List] Corpora, annotation schema/guidelines and NER systems for LOCATION or geonames

Dear all,

We are trying to collect a more comprehensive view on the current NLP resources related to the annotation, automatic recognition, and normalization/grounding of LOCATION or geonames/places related entity types (for data in English, and particularly also other languages).

We did have a look at the ENAMEX tagset (Location and sub-tags) and guidelines, ACE and CLIA.

We would really appreciate feedback on current NER and entity linking components, corpora, and also annotation guidelines for different languages, including English, Spanish, Italian, French, German, Portuguese, or Swedish. Anything with a special focus on movements and travels would also be really interesting.

Best regards,

--

Salvador Lima Lopez RESEARCH ENGINEER Life Sciences - Text Mining, BSC-CNS Barcelona, Spain

-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 6091 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20211106/c3d98cd5/attachment.txt>



More information about the Corpora mailing list