For a recent experiment on designing a tagset following this framework take a look at: Serge Sharoff, Mikhail Kopotev, Tomaz Erjavec, Anna Feldman, and Dagmar Divjak. Designing and evaluating a Russian tagset. In Proceedings of the Sixth Language Resources and Evaluation Conference, LREC 2008, Marrakech, 2008. http://corpus.leeds.ac.uk/mocky/lrec2008-msd.pdf
Serge
-----Original Message----- From: corpora-bounces at uib.no on behalf of Adam Teichert Sent: Fri 30/01/2009 20:53 To: corpora at uib.no Subject: [Corpora-List] Universal POS Tagset
Hello all.
I've been looking for a POS tagset that is general enough to effectively tag "any" natural language. (I'm looking at Linguistic Typology / Universal Implications so I want to compare POS taggings across many [possibly obscure] languages.) Does anyone know of such a tagset?
If anyone is interested in what I've found so far, this paper seems relevant:
"Induction of Fine-grained Part-of-speech Taggers via Classifier Combination and Crosslingual Projection" (Elliott Franco DrŽabek, David Yarowsky)
http://acl.ldc.upenn.edu/W/W05/W05-0807.pdf
Also, I'm aware of some efforts at Microsoft Research India, to perhaps develop a "universal" tagset for Indian Languages:
http://research.microsoft.com/en-us/groups/mls/default.aspx
Thanks for any ideas.
--Adam (R. Teichert)
MS Student
School of Computing
University of Utah
_______________________________________________ Corpora mailing list Corpora at uib.no http://mailman.uib.no/listinfo/corpora