[Corpora-List] MultiVec: a Multilingual and MultiLevel Representation Learning Toolkit for NLP

Christophe Servan christophe.servan at gmail.com
Tue Apr 26 16:52:31 CEST 2016


Dear All,

We are happy to announce the release of our new toolkit “MultiVec” for computing continuous representations for text at different granularity levels (word-level or sequences of words). MultiVec includes Mikolov et al. [2013b]’s word2vec features, Le and Mikolov [2014]’s paragraph vector (batch and online) and Luong et al. [2015]’s model for bilingual distributed representations. MultiVec also includes different distance measures between words and sequences of words. The toolkit is written in C++ and is aimed at being fast (in the same order of magnitude as word2vec), easy to use, and easy to extend. It has been evaluated on several NLP tasks: the analogical reasoning task, sentiment analysis, and crosslingual document classification. The toolkit also includes C++ and Python libraries, that you can use to query bilingual and monolingual models.

The project is fully open to future contributions. The code is provided on the project webpage ( <https://github.com/eske/multivec> https://github.com/eske/multivec) with installation instructions and command-line usage examples.

When you use this toolkit, please cite:

@InProceedings{MultiVecLREC2016,

Title = {{MultiVec: a Multilingual and MultiLevel Representation Learning Toolkit for NLP}},

Author = {Alexandre Bérard and Christophe Servan and Olivier Pietquin and Laurent Besacier},

Booktitle = {The 10th edition of the Language Resources and Evaluation Conference (LREC 2016)},

Year = {2016},

Month = {May}

}

The paper is available here: <https://github.com/eske/multivec/raw/master/docs/Berard_and_al-MultiVec_a_M ultilingual_and_Multilevel_Representation_Learning_Toolkit_for_NLP-LREC2016. pdf> https://github.com/eske/multivec/raw/master/docs/Berard_and_al-MultiVec_a_Mu ltilingual_and_Multilevel_Representation_Learning_Toolkit_for_NLP-LREC2016.p df

Best regards,

Alexandre Bérard, Christophe Servan, Olivier Pietquin and Laurent Besacier

(With apologies for cross-posting)

-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 6122 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20160426/688287a9/attachment.txt>



More information about the Corpora mailing list