[Corpora-List] HamleDT 3.0 / Universal Dependencies

Dan Zeman zeman at ufal.mff.cuni.cz
Fri Aug 21 16:36:40 CEST 2015

(apologies for cross-posting)

Dear colleagues,

we are pleased to announce the release of HamleDT 3 .0 (Universal Dependencies)


HamleDT (HArmonized Multi-LanguagE Dependency Treebank) is a compilation of 42 existing dependency treebanks (or dependency conversions of other treebanks), of 36 languages, transformed so that they all conform to the same annotation style. In this version, HamleDT switches to Universal Dependencies ( http://universaldependencies.github.io/docs/ ) as its target annotation style. The collection is a superset of UD 1.1 (released May 2015), to which i t adds 12 “free” and 11 “patched” treebanks.

The main motivation behind HamleDT is that having many corpora adhere to the same annotation guidelines significantly facilitates any cross-language work. In particular, cross-lingual comparability of parsing results should be much better on harmonized data than among the original, non-harmonized treebanks.

If you use HamleDT in your research, please cite:

@article{ hamledt ,

journal = {Language Resources and Evaluation},

title = {Hamle{DT}: Harmonized Multi-Language Dependency Treebank},

author = {Daniel Zeman and Ondřej Dušek and David Mareček and Martin Popel and Loganathan Ramasamy and Jan Štěpánek and Zdeněk Žabokrtský and Jan Hajič},

year = {2014},

address = {Dordrecht, Netherlands},

volume = {48},

number = {4},

pages = {601–637},

issn = {1574-020X}, }

Best regards

Dan Zeman

┌FAL MFF, Charles University in Prague

zeman @ ufal.mff.cuni.cz -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 25665 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20150821/c53e9d82/attachment.txt>

More information about the Corpora mailing list