We are glad to release the Semantic Textual Similarity benchmark.
STS Benchmark comprises a selection of the English datasets used in the STS tasks organized in the context of SemEval between 2012 and 2017. The selection of datasets include text from image captions, news headlines and user forums.
The main goal is to provide a standard benchmark to compare among meaning representation systems in future years. Previously, several authors have reported results across different years, with a different mixture of genres and training conditions.
We organized STS benchmark into train, development and test. The development part can be used to develop and tune hyperparameters of the systems, and the test part should be only used once for the final system.
We already have results for some relevant systems. Please find all details in http://ixa2.si.ehu.es/stswiki/index.php/STSbenchmark
eneko, dan, mona
Eneko Agirre Euskal Herriko Unibertsitatea Universidad del Pais Vasco University of the Basque Country http://ixa2.si.ehu.eus/eneko