[Corpora-List] large and multi-lingual collections of semantic dependency graphs

Stephan Oepen oe at ifi.uio.no
Sat Jun 25 00:07:12 CEST 2016


dear colleagues,

we are happy to announce the general availability of a large and carefully curated collection of target representations for Semantic Dependency Parsing (SDP), which have previously been used in connection with the 2014 and 2015 Semantic Evaluation Exercises (SemEval). these representations take the form of bi-lexical semantic dependency graphs, where nodes are comprised of surface tokens, and binary, asymmetric dependency edges encode predicate–argument structure.

unlike common target representations in syntactic dependency parsing, the SDP graphs relax standard structural constraints on syntax trees, i.e. they need not be singly-rooted, allow re-entrancies (argument sharing across predicates) and crossing edges, and can leave semantically vacuous tokens unconnected. at the same time, these graphs are less partial than typical target representations in semantic role labeling, providing predicate–argument relations for all content words.

the SDP semantic dependency graphs are grounded in formal linguistic theory, viz. Combinatory Categorial Grammar (CCG), Functional Generative Description (FGD), and Head-Driven Phrase Grammar (HPSG). for English, SDP provides four parallel (sentence- and token-aligned) annotations for some 900,000 tokens of running text from the venerable WSJ and Brown corpora. comparable data volumes are available for Chinese and Czech, albeit in only one target representation each.

for general background on the SDP dependency graphs, results from the earlier SemEval tasks, and access details, please see the following pages (and summary papers linked there):

http://sdp.delph-in.net/

the LDC has just published a public re-release of the original SDP 2014 and 2015 data (including all ‘companion’ data, the official scorers, and all system submissions received). also in this new package, we have added a fourth target representation—dubbed CCD—which seeks to make available a canonical version of the conversion from CCGbank files to bi-lexical dependency graphs. for the complete LDC release of the SDP 2016 package, please see:

https://catalog.ldc.upenn.edu/LDC2016T10

a sub-set of the SDP target representations is not derivative of LDC annotations and is thus available under a Creative Commons licensing scheme for direct download. further information on the contents of the Open SDP sub-set are available on the following page:

http://sdp.delph-in.net/index.php?page=5

we hope to stimulate continued research interest in the SDP parsing problem (and, ideally, more cross-framework comparison of the various graph representations). in case you would like to use the SDP target representations in your own work (syntactico-semantic parsing into graph-structured target representations or linguistic comparison across different schools of thought), or if you have suggestions for improving or correcting the above web pages and data packages, we will be delighted to hear from you.

best wishes, oe (for the SDP task organizers)

dan flickinger jan hajič angelina ivanova marco kuhlmann yusuke miyao stephan oepen daniel zeman



More information about the Corpora mailing list