[Corpora-List] Announcement: Release of the Dependency Treebank Database DTDB 1.0

Joakim Nivre nivre at msi.vxu.se
Sun Feb 3 17:31:08 CET 2008

Hi Olga,

In the CoNLL shared tasks on dependency parsing in 2006 and 2007, a number of treebanks were used, some of which do not yet seem to be part of the database:

Prague Arabic Dependency Treebank Basque Dependency Treebank Sinica Treebank (Chinese) Penn Treebank (English) Tiger Treebank (German) Greek Dependency Treebank Szeged Treebank (Hungarian) Italian Syntactic-Semantic Treebank Verbmobil Treebank (Japanese) Floresta Sintactica (Portuguese) Metu-Sabanci Turkish Treebank

Of course, not all of these are genuine dependency treebanks, but judging from the treebanks included in the database so far, this does not seem to be a necessary requirement.

In addition, there are two depedency treebanks for Latin, although I think they are being merged, and a third treebank for Italian (the Venice Italian Treebank), which also exists in a dependency version.

Best, Joakim

On Sun, 3 Feb 2008, Eric Atwell wrote:

> On Fri, 1 Feb 2008, Olga Pustylnikov wrote:
> > My question is: do other treebanks exist which are not part of the database?
> > If you know of an existing treebank that should be transformed into the
> > unified format please, let me know.
> Olga,
> The AMALGAM multi-parsed treebank is a small sample of 60 sentences
> parsed according to 14 different parsing schemes (parser outputs or
> corpus annotation schemes); it might be an interesting challenge to
> see whether/how these different representations can be transformed
> into eGXL.
> http://www.comp.leeds.ac.uk/amalgam/amalgam/multi-parsed.html
> Eric Atwell, University of Leeds, WWW/email: google Eric Atwell
