The English GUM corpus has nested, named and non-named entity recognition for all referring expressions across 12 spoken and written genres, including places. It covers proper and common noun mentions, as well as pronouns with coreference resolution, and named entity linking to Wikipedia, as described in this paper:
The corpus and more information can be found here:
Dr. Amir Zeldes
Assoc. Prof. of Computational Linguistics
Department of Linguistics
1437 37th St. NW
Washington, DC 20057
From: corpora-bounces at uib.no <corpora-bounces at uib.no> On Behalf Of Salvador Lima Sent: Thursday, October 28, 2021 11:54 AM To: corpora at uib.no Subject: [Corpora-List] Corpora, annotation schema/guidelines and NER systems for LOCATION or geonames
We are trying to collect a more comprehensive view on the current NLP resources related to the annotation, automatic recognition, and normalization/grounding of LOCATION or geonames/places related entity types (for data in English, and particularly also other languages).
We did have a look at the ENAMEX tagset (Location and sub-tags) and guidelines, ACE and CLIA.
We would really appreciate feedback on current NER and entity linking components, corpora, and also annotation guidelines for different languages, including English, Spanish, Italian, French, German, Portuguese, or Swedish. Anything with a special focus on movements and travels would also be really interesting.
Salvador Lima Lopez RESEARCH ENGINEER Life Sciences - Text Mining, BSC-CNS Barcelona, Spain
-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 6091 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20211106/c3d98cd5/attachment.txt>