[Corpora-List] [Release] Outlier detection task: test your word embeddings!

Jose Camacho Collados collados at di.uniroma1.it
Mon Jun 27 11:14:50 CEST 2016


We have developed a new dataset (including an easy-to-use Python scorer for word embeddings) based on the outlier detection task. Given a group of words, the goal of the outlier detection task is to identify the word that does not belong in the group. For example, book would be an outlier for the set of words {apple, banana, lemon, book, orange}, as it is not a fruit like the others. This task is particularly suitable to test interesting properties of word vectors not fully addressed to date in common intrinsic evaluation benchmarks such as word similarity. Although the task is quite well-defined and humans achieve a near-perfect performance, this task is still challenging for state-of-the-art word embeddings.

Please find more information about the dataset and the outlier detection task in the reference paper. The dataset and the Python script to test your word embeddings are freely available at http://lcl.uniroma1.it/outlier-detection/

Reference:

Josť Camacho-Collados and Roberto Navigli. Find the word that does not belong: A Framework for an Intrinsic Evaluation of Word Vector Representations. In Proceedings of the ACL Workshop on Evaluating Vector Space Representations for NLP, Berlin, Germany, August 12, 2016.

http://lcl.uniroma1.it/outlier-detection/ACL16_REPEVAL_Outlier_Detection.pdf

Best regards,

Josť Camacho Collados and Roberto Navigli Linguistic Computing Laboratory, Sapienza University of Rome

-- Josť Camacho Collados Linguistic Computing Laboratory (LCL) Sapienza University of Rome http://wwwusers.di.uniroma1.it/~collados/ -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 6140 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20160627/05523baa/attachment.txt>



More information about the Corpora mailing list