[Corpora-List] A new dataset release - WS353 translated to multiple languages an re-scored by fluent speakers of these languages

Taher Pilehvar pilehvar at di.uniroma1.it
Tue Aug 18 12:42:27 CEST 2015


Following the release of the Multilingual WordSim-353 by Leviant and Reichart (2015), we announce the availability of six cross­-lingual word similarity datasets which were automatically constructed based on our paper that was recently presented at ACL (see reference below). The following language pairs are covered:

* English-­German * English­-Italian * English-­Russian * German­-Italian * German­-Russian * Italian-­Russian

To download, please visit http://lcl.uniroma1.it/similarity-datasets/

*References:*

José Camacho-­Collados, Mohammad Taher Pilehvar and Roberto Navigli. *A Framework for the Construction of Monolingual and Cross­lingual Word Similarity Datasets*. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics (ACL 2015), Beijing, China, 2015, pp. 1-7.

Ira Leviant and Roi Reichart. *Judgment Language Matters: Multilingual Vector Space Models for **Judgment Language Aware Lexical Semantics*. 2015, Preprint pubslished on arXiv. arxiv:1508.00106

On Wed, Aug 12, 2015 at 7:32 PM, Roi Reichart <roiri at ie.technion.ac.il> wrote:


> Greetings,
>
> We would like to announce the release of a new resource - multilingual
> WS353. This resource consists of translations of the WS353 word
> association data set to three languages: German, Italian and Russian.
> Each of the translated datasets is scored by 13 human judges (crowd
> workers) - all fluent speakers of its language. For consistency, we
> also collected human judgments for the original English corpus
> according to the same protocol applied to the other languages.
>
> This dataset allows to explore the impact of the "judgement language"
> (the language in which word pairs are presented to the human judges)
> on the resulted similarity scores and to evaluate vector space models
> on a truly multilingual setup (i.e. when both the training and the
> test data are multilingual).
>
> The translation and annotation process, as well as related research on
> the impact of judgment language are described in the paper:
>
> Judgment Language Matters: Multilingual Vector Space Models for
> Judgment Language Aware Lexical Semantics. 2015. Ira Leviant, Roi
> Reichart . Preprint pubslished on arXiv. arxiv:1508.00106
>
> The data and paper can be downloaded from the project page at:
>
> http://technion.ac.il/~irakr/MultilingualVSMdata.html
>
> We will soon release similar data for the simLex999 word similarity
> dataset.
>
> Please do not hesitate to contact Ira or myself with any question you
> may have regarding this data.
>
> Best,
> Roi Reichart
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>
-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 4064 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20150818/8dc4d8f7/attachment.txt>



More information about the Corpora mailing list