I would recommend something like ELMo: https://allennlp.org/elmo, https://arxiv.org/abs/1802.05365
It is a large pre-trained language model that works well on most of the semantic tasks (https://gluebenchmark.com). There are bunch of models that perform even better, but I am not sure how easily available they are. For ELMo, you just need to install AllenNLP.
----- Original Message ----- From: "Ignacio J. Iacobacci" <iiacobac at gmail.com> To: osherenko at gmx.de Cc: corpora at uib.no Sent: Friday, 19 October, 2018 11:13:42 Subject: Re: [Corpora-List] Geometrical representation of NL phrases for similarity comparison
There are many options, much better that this one, but doc2vec, the extension of word2vec for sentences and documents will work for you [ https://radimrehurek.com/gensim/models/doc2vec.html | https://radimrehurek.com/gensim/models/doc2vec.html ]
All the best!
El vie., 19 oct. 2018 a las 10:10, Alexander Osherenko (< [ mailto:osherenko at gmx.de | osherenko at gmx.de ] >) escribió:
Thanks, Mohammad. Unfortunately, I looking for a geometric representation of phrases, not of words.
Am Fr., 19. Okt. 2018 um 11:01 Uhr schrieb Mohammad Akbari < [ mailto:akbari.ma at gmail.com | akbari.ma at gmail.com ] >:
Word embedding models, such as word2vec, and glove, are common approaches; where words represented with a numerical vector ( [ https://arxiv.org/pdf/1310.4546.pdf | https://arxiv.org/pdf/1310.4546.pdf ] , [ https://code.google.com/archive/p/word2vec/ | https://code.google.com/archive/p/word2vec/ ] ). When you have word embedding, you can do geometric computations based other vectors. A common approach is to compute the average embedding of all words in a phrase; You can check fasttext for this purpose.
On 19 Oct 2018, at 09:41, Alexander Osherenko < [ mailto:osherenko at gmx.de | osherenko at gmx.de ] > wrote:
I wonder if it is possible to represent NL phrases geometrically, for example, to compare their similarity. For example, the phrase "Hey man, that chick is such a catch! " and more formal "..., this girl is pretty!" should be represented geometrically nearby because they are semantically similar.
I am aware of LSA vectors that represent particular words and similarity could be evaluated as a distance between these word vectors in the LSA space. However, the LSA approach only works for individual words and no phrases and it is IMHO too numerical because it doesn't consider semantics of participating words.
Best, Alexander -- Alexander Osherenko, Dr. rer. nat. Senior HCI architect Founder and R&D [ http://www.socioware.de/osherenko_page.html | Socioware Development ] Profile: [ https://www.researchgate.net/profile/Alexander_Osherenko | ResearchGate ] [ https://www.researchgate.net/publication/327425719_Implementing_Social_Smart_Environments_with_a_Large_Number_of_Believable_Inhabitants_in_the_Context_of_Globalization | Implementing Social Smart Environments with a Large Number of Believable Inhabitants in the Context of Globalization ] at Springer _______________________________________________ UNSUBSCRIBE from this page: [ http://mailman.uib.no/options/corpora | http://mailman.uib.no/options/corpora ] Corpora mailing list [ mailto:Corpora at uib.no | Corpora at uib.no ] [ https://mailman.uib.no/listinfo/corpora | https://mailman.uib.no/listinfo/corpora ]
_______________________________________________ UNSUBSCRIBE from this page: [ http://mailman.uib.no/options/corpora | http://mailman.uib.no/options/corpora ] Corpora mailing list [ mailto:Corpora at uib.no | Corpora at uib.no ] [ https://mailman.uib.no/listinfo/corpora | https://mailman.uib.no/listinfo/corpora ]
-- Men who become accustomed to worrying about the needs of machines become callous about the needs of men (Isaac Asimov)
Ignacio J. Iacobacci [ mailto:iiacobac at gmail.com | iiacobac at gmail.com ] [ mailto:iiacobacci at dc.uba.ar | iiacobacci at dc.uba.ar ] [ mailto:iacobacci at di.uniroma1.it | iacobacci at di.uniroma1.it ]
_______________________________________________ UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora Corpora mailing list Corpora at uib.no https://mailman.uib.no/listinfo/corpora