[Corpora-List] annotation tags

Hugh Paterson III sil.linguist at gmail.com
Mon Dec 7 22:05:45 CET 2020


Greetings,

Can anyone point me to a set of annotation tags which are commonly used across corpora projects?

In the area of Field Linguistics, Grammar writing is a process of publishing a description on how a natural language functions—usually as a book. Within this practice of publication it is common to give examples as interlinear glosses which may be word or morpheme aligned. Over the last 15 or so years there has been an ad-hoc effort to standardize the tags used to describe morphemes in these interlinear glosses. This effort has been influenced by something called the Leipzig Glossing Rules (LGR), which provided a suggested list of abbreviations based on some prior art.

I have noticed that some of these abbreviations have now surfaced in annotated corpora within the domain of Language Documentation, which frequently uses a tool called ELAN to annotate audio/video texts in under-resourced languages.

So, within Language Documentation and Field Linguistics one can see the influence of LGR in the types of annotation tags chosen in a corpus. HOWEVER, I am wondering if there is perhaps a different influence for the types of values one might see in corpora of more-resourced languages. That is, is there any continuity in the practice of corpora annotation regardless of the sub-field within linguistics where the corpus might originate?

Can anyone point me to a set of annotation tags which are commonly used across corpora projects?

all the best, - Hugh Paterson III -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 1685 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20201207/cbd5e2d9/attachment.txt>



More information about the Corpora mailing list