The POS tags and morphological category values used in the Universal Dependencies project are becoming popular across a range of languages, including both high and low resource languages. You can read more about the inventories here:

Universal (coarse) POS tags: https://universaldependencies.org/u/pos/index.html

Morphological features: https://universaldependencies.org/u/feat/index.html

Syntactic function labels: https://universaldependencies.org/u/dep/index.html

For morphological categories you may also want to check out UniMorph:





Can anyone point me to a set of annotation �tags which are commonly used across �corpora projects?

In the area of Field Linguistics, Grammar writing is a process of publishing a description on how a natural language functions—usually as a book. Within this practice of publication it is common to give examples as interlinear �glosses which may be word or morpheme aligned. Over the last 15 or �so years there has been an ad-hoc effort to standardize the tags used to describe �morphemes in these interlinear glosses. This effort has been influenced by something called the Leipzig Glossing Rules (LGR), which provided a suggested list of abbreviations �based on some prior art.

I have noticed that some of these abbreviations �have now surfaced in annotated corpora within the domain of Language Documentation, which frequently uses a tool called ELAN to annotate audio/video texts in under-resourced languages. �

So, within Language Documentation and Field Linguistics one can see the influence of LGR in the types of annotation tags chosen in a corpus. HOWEVER, I am wondering if there is perhaps a different influence for the types of values one might see in corpora of more-resourced languages. That is, is there any continuity in the practice of corpora annotation regardless of the sub-field within linguistics where the corpus might originate? �

