[Corpora-List] The Eukalyptus Treebank of Written Swedish, v1.0.0

Gerlof Bouma gerlof.bouma at gu.se
Fri Oct 1 16:28:05 CEST 2021

Dear all,

We are very pleased to announce the release of the

Eukalyptus Treebank of Written Swedish, v1.0.0.

Eukalyptus contains almost 100 thousand tokens of written, contemporary Swedish of different text types/genres (novels, news texts, Wikipedia articles, blog texts and Europarl proceedings). Texts have been manually annotated with lemmata, word senses, parts of speech, multi-word units, and syntactic structure (constituents with grammatical functions).

The treebank – source texts and annotations – is released under a CC BY-SA 4.0 license, and is currently distributed in the TIGER-XML format.

For download details, please visit


The download archive also contains documentation and publications related to the design of Eukalyptus.

We hope you find Eukalyptus useful in your work. Please do not hesitate to contact us for questions and/or comments at <sb-info at svenska.gu.se>.

On behalf of the Eukalyptus team, Gerlof Bouma <sb-info at svenska.gu.se>

More information about the Corpora mailing list