We are very pleased to announce the release of the
Eukalyptus Treebank of Written Swedish, v1.0.0.
Eukalyptus contains almost 100 thousand tokens of written, contemporary Swedish of different text types/genres (novels, news texts, Wikipedia articles, blog texts and Europarl proceedings). Texts have been manually annotated with lemmata, word senses, parts of speech, multi-word units, and syntactic structure (constituents with grammatical functions).
The treebank – source texts and annotations – is released under a CC BY-SA 4.0 license, and is currently distributed in the TIGER-XML format.
For download details, please visit
https://spraakbanken.gu.se/en/resources/eukalyptus
The download archive also contains documentation and publications related to the design of Eukalyptus.
We hope you find Eukalyptus useful in your work. Please do not hesitate to contact us for questions and/or comments at <sb-info at svenska.gu.se>.
On behalf of the Eukalyptus team, Gerlof Bouma <sb-info at svenska.gu.se>