[Corpora-List] Universal Dependencies, version 1

Joakim Nivre joakim.nivre at lingfil.uu.se
Sat Oct 4 17:11:31 CEST 2014

We are happy to announce the release of the annotation guidelines for Universal Dependencies at http://universaldependencies.github.io/docs/.

Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages with the goal of facilitating multilingual parser development, cross-lingual learning, and parsing research from a language typology perspective. The annotation scheme is based on (universal) Stanford dependencies (de Marneffe et al., 2006, 2008, 2014), Google universal part-of-speech tags (Petrov et al., 2012), and the Interset interlingua for morphosyntactic tagsets (Zeman, 2008). The general philosophy is to provide a universal inventory of categories and guidelines to facilitate consistent annotation of similar constructions across languages, while allowing language-specific extensions when necessary.

We intend to treat version 1 as stable for at least the next year, but we may subsequently make further revisions based on experiences using it to treebank a range of languages. Our goal is to make a first release of data sets with language-specific documentation by January 1, 2015. If you are interested in contributing to this effort, please get in touch.

Jinho Choi, Marie-Catherine de Marneffe, Tim Dozat, Filip Ginter, Yoav Goldberg, Jan Hajic, Christopher Manning, Ryan McDonald, Joakim Nivre, Slav Petrov, Sampo Pyysalo, Natalia Silveira, Reut Tsarfaty, Dan Zeman

-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 2536 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20141004/4f416be2/attachment.txt>

More information about the Corpora mailing list