[Corpora-List] Corpora, annotation schema/guidelines and NER systems for LOCATION or geonames

ALEX Bea b.alex at ed.ac.uk
Fri Oct 29 12:51:55 CEST 2021

Dear Salvador,

for English, there’s the Edinburgh Geoparser which you may already have come across.


It recognised place names and resolves them to gazetteers. Geonames is one of the gazetteers that can be specified but there are others. You’ll find more information on its functionality and the gazetteers it supports in the documentation.

The locations recognised are broadly similar to ENAMEX ones.



On 28 Oct 2021, at 16:53, Salvador Lima <salvador.limalopez at gmail.com<mailto:salvador.limalopez at gmail.com>> wrote:

This email was sent to you by someone outside the University. You should only click on links or attachments if you are certain that the email is genuine and the content is safe. Dear all,

We are trying to collect a more comprehensive view on the current NLP resources related to the annotation, automatic recognition, and normalization/grounding of LOCATION or geonames/places related entity types (for data in English, and particularly also other languages).

We did have a look at the ENAMEX tagset (Location and sub-tags) and guidelines, ACE and CLIA.

We would really appreciate feedback on current NER and entity linking components, corpora, and also annotation guidelines for different languages, including English, Spanish, Italian, French, German, Portuguese, or Swedish. Anything with a special focus on movements and travels would also be really interesting.

Best regards,

-- Salvador Lima Lopez RESEARCH ENGINEER Life Sciences - Text Mining, BSC-CNS Barcelona, Spain _______________________________________________ UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora Corpora mailing list Corpora at uib.no<mailto:Corpora at uib.no> https://mailman.uib.no/listinfo/corpora

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. Is e buidheann carthannais a th’ ann an Oilthigh Dhùn Èideann, clàraichte an Alba, àireamh clàraidh SC005336. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 4018 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20211029/6a53fe32/attachment.txt>

More information about the Corpora mailing list