[Corpora-List] Communicator corpora parsed?

David Reitter david.reitter at gmail.com
Fri Jul 15 16:11:02 CEST 2005

I received two replies to my earlier question regarding the
availability of syntactic annotations of the DARPA Communicator
corpus and of other spoken dialogue corpora.
Both Sandra Kübler at Tübingen and Detmar Meurers at Ohio State
recommended the Verbmobil treebanks, which contain spoken dialogue in
German, English and Japanese. They are available via


A newer version of the German treebank is in preparation.

As a side note: many (if not most) of the non-canned, spontaneous
speech in Communicator consists of very short utterances. In
contrast, the Maptask corpus (developed here at HCRC, Edinburgh;
spoken human-human dialogue) has a lot to offer in terms of syntax

Thanks for the replies.

> is anyone aware of syntactic annotations of the (e.g. DARPA)

> Communicator corpus, or similar large, task-oriented human/machine

> or human/human dialogue corpora?

> I'm looking for tree structures, and atomic categories such as VP

> or PP would do just fine. I could work with non-perfect (i.e.

> machine- parsed) annotations.

> Generally I'd be grateful for tips regarding larger spoken

> dialogue corpora (task-oriented dialogue) that have been

> syntactically annotated.


David Reitter - ICCS/HCRC, Informatics, University of Edinburgh

