[Corpora-List] Create an ontology based on manual text annotations

Sérgio Matos aleixomatos at ua.pt
Tue Mar 31 16:51:33 CEST 2020


You can try Egas [1]: https://demo-egas-tmp.bmd-software.com/egas/

Egas was created as an online assisted text curation tool for the biomedical domain, but it can be used in any domain. Concepts and relations are defined, per-project, and new concepts/relations can be added as you go along. See the screenshot (https://bit.ly/33WB0QU) for an example I just created. The annotations can be exported to BIOC (XML) or A1, a shown here:

T1 PRODUCT 4 11 Big Mac T2 COMPANY 76 86 McDonald's R1 IS_A_PROVIDER_OF Arg1:T2 Arg2:T1

You can start with raw documents or you may import annotated documents and edit/complete the annotations. For example, you may use the open-source tool Neji (https://github.com/BMDSoftware/neji/wiki) with dictionaries in the simple format:

Concept_id_1 McDonalds|McDonald's Concept_id_2 BigMac|Big Mac

Another possibly interesting feature is ‘blind' annotation. You can have multiple users annotating the texts, possibly with a percentage of corpus overlap.

Some limitations: - Egas only supports relations on the same sentence - Egas uses (biomedical) ontologies for concept normalisation, but these are managed centrally. Otherwise, it would be possible to normalise “It” on the second sentence to the concept ID for “Big Mac”, then use that in creating the ontology.

Regards, Sérgio Matos

[1] https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4207226/ [2] https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-14-281

On 28 March 2020 at 09:17:28, Louis de Viron (louis at datatext.eu<mailto:louis at datatext.eu>) wrote:

Dear all,

I am looking for a tool for creating ontologies based on a manual annotation of texts. We want to easily annotate entities and relations between them. Example:

* “Big Mac” is manually annotated as a PRODUCT

* “McDonald’s” is manually annotated as a COMPANY

* “Big Mac” and “McDonald’s” are linked by a relation which is manually annotated as IS_A_PROVIDER_OF.

It implies the following functionalities:

1. Definition of an ontology schema (PRODUCT, COMPANY, IS_A_PROVIDER_OF)

2. Entities with defined attributes

3. Relationships between predefined entity types (The relation IS_A_PROVIDER_OF should only be possible between the entity types COMPANY and PRODUCT.). The annotation of the relations should be user-friendly, also across the whole document.

4. A plain text annotation tool

5. A way to export the ontology that has been created

I am wondering if such a tool exists and is available

Many thanks for your help

Best regards


[cid:286F3BD4-C14C-43B2-8852-B23A01E6F0AC] Louis de VIRON - DataText SPRL Freelance Data Scientist - NLP Engineer Phone: +32 476 54 11 86<callto:0032476541186> Web: www.datatext.eu<https://www.datatext.eu> E-mail: louis at datatext.eu<mailto:louis at datatext.eu> Address: Rue Émile Wittmann 36 - 1030 Schaerbeek (BE) VAT: BE 0721.433.837 _______________________________________________ UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora Corpora mailing list Corpora at uib.no https://mailman.uib.no/listinfo/corpora -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 6743 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20200331/37e64077/attachment.txt> -------------- next part -------------- A non-text attachment was scrubbed... Name: part1.DCE0367E.F3A30ADA at datatext.eu Type: application/octet-stream Size: 9324 bytes Desc: part1.DCE0367E.F3A30ADA at datatext.eu URL: <https://mailman.uib.no/public/corpora/attachments/20200331/37e64077/attachment.obj>

More information about the Corpora mailing list