[Corpora-List] BOUNCE corpora at lists.uib.no: Non-member submission from [David Reitter <dreitter at inf.ed.ac.uk>] (fwd)

Knut Hofland knut at aksis.uib.no
Thu Jul 14 16:36:00 CEST 2005

From: David Reitter <dreitter at inf.ed.ac.uk>
Subject: Re: [Corpora-List] Communicator corpora parsed?
Date: Thu, 14 Jul 2005 14:13:59 +0100
To: corpora at hd.uib.no
X-Mailer: Apple Mail (2.733)
X-Provags-ID: kundenserver.de abuse at kundenserver.de login:f3c9a04d49beab9fcce37ffcb55ebfb9
X-checked-clean: by exiscan on rolf
X-Scanner: dcaa7fd1c863bbb41df6d4b6c9b93a92 http://tjinfo.uib.no/virus.html
X-UiB-SpamFlag: NO UIB: -7 hits, 8.0 required
X-UiB-SpamReport: spamassassin found;
-7.0 Asked for it

I received two replies to my earlier question regarding the =20
availability of syntactic annotations of the DARPA Communicator =20
corpus and of other spoken dialogue corpora.
Both Sandra K=FCbler at T=FCbingen and Detmar Meurers at Ohio State =20
recommended the Verbmobil treebanks, which contain spoken dialogue in =20=

German, English and Japanese. They are available via


A newer version of the German treebank is in preparation.

As a side note: many (if not most) of the non-canned, spontaneous =20
speech in Communicator consists of very short utterances. In =20
contrast, the Maptask corpus (developed here at HCRC, Edinburgh; =20
spoken human-human dialogue) has a lot to offer in terms of syntax

Thanks for the replies.

> is anyone aware of syntactic annotations of the (e.g. DARPA) =20

> Communicator corpus, or similar large, task-oriented human/machine =20

> or human/human dialogue corpora?

> I'm looking for tree structures, and atomic categories such as VP =20

> or PP would do just fine. I could work with non-perfect (i.e. =20

> machine- parsed) annotations.

> Generally I'd be grateful for tips regarding larger spoken =20

> dialogue corpora (task-oriented dialogue) that have been =20

> syntactically annotated.

More information about the Corpora-archive mailing list