[Corpora-List] Large resources of pairwise semantic proximity judgments / Word Usage Graphs

Dominik Schlechtweg dominik.schlechtweg at gmx.de
Sat Nov 6 00:06:26 CET 2021


Dear colleagues,

I would like to point your attention to the release of several large Word Uses Graph data sets which may be interesting to some of your work. Word Usage Graphs (WUGs) represent usages of a word as nodes in a graph which are connected by weighted edges representing (human-annotated) semantic proximity. These can be exploited in various ways, e.g. as resources of thousands of word use pair semantic proximity judgments or as clustered graph representations of word use sets. As such they provide various possibilities to evaluate computational lexical semantic models (e.g. contextualized embeddings, word sense disambiguation/discrimination) with additional aspects such as variation over time or dialect.

The paper describing a large part of the data sets is

Dominik Schlechtweg, Nina Tahmasebi, Simon Hengchen, Haim Dubossarsky, and Barbara McGillivray. 2021. *DWUG: A large Resource of Diachronic Word Usage Graphs in Four Languages.* In Proceedings of EMNLP 2021. https://arxiv.org/abs/2104.08540

and the data sets are available in *English, German, Latin, Swedish and Spanish* at

https://www.ims.uni-stuttgart.de/data/wugs

The paper will be presented at EMNLP this Tuesday at Poster Session III (08:30 AST).

Best, Dominik



More information about the Corpora mailing list