[Corpora-List] List of papers with UD interannotator agreement

Arne Skjærholt arnskj at ifi.uio.no
Mon Jul 9 21:15:45 CEST 2018

I'm not aware of any UD inter-annotator agreement corpora, but for my paper on a chance-corrected IAA metric for syntax[0] I contacted a large number of treebank projects and asked about access to IAA data. In the end, I got four different corpora: The Norwegian Dependency Treebank (through my own collaboration with that project), the Copenhagen Dependency Treebanks (available from their SVN repository), the Prague Czech-English Dependency Treebank (thanks to Jan Stepanek), and the StarSem Data from the LiNGO consortium (thanks to Emily Bender and Stephan Oepen). The code repository for my paper has the CDT, NDT and SSD corpora ready for processing. The Danish portion of the CDT and the NDT are available converted into UD, but the raw data are not UD native.

The paper also contains a review of previous approaches to agreement for syntax (and why I find them insufficient), which may also be of interest to you if you were not already aware of it. Note that the \alpha_{diff} metric discussed in the paper is invalid, due to \delta_{diff} not being a metric function. This will be discussed in my PhD thesis, which is not published yet, unfortunately.

Regards, Arne Skjærholt

0: Skjærholt (2014): "A chance-corrected measure of inter-annotator agreement for syntax", Proc. ACL

More information about the Corpora mailing list