[Corpora-List] Geometrical representation of NL phrases for similarity comparison

Daniel Cer cer at google.com
Fri Oct 19 17:06:22 CEST 2018


Hi Alexander,

You could try using the Universal Sentence Encoder: https://tfhub.dev/google/universal-sentence-encoder/2

It performs well on sentence level semantic textual similarity. There's an online demo / notebook available here: https://colab.research.google.com/github/tensorflow/hub/blob/master/examples/colab/semantic_similarity_with_tf_hub_universal_encoder.ipynb

The demo includes a simple pairwise similarity visualization as well as example code for using it for STS. If you want to get the best results I would recommend using the transformer based model (universal-sentence-encoder-large).

Disclaimer: I'm one of the authors. We'll be presenting it during the EMNLP demo section later this month.

Dan

On Fri, Oct 19, 2018 at 3:03 AM Jindrich Libovicky < libovicky at ufal.mff.cuni.cz> wrote:


> Hi Alexander,
>
> I would recommend something like ELMo: https://allennlp.org/elmo,
> https://arxiv.org/abs/1802.05365
>
> It is a large pre-trained language model that works well on most of the
> semantic tasks (https://gluebenchmark.com). There are bunch of models
> that perform even better, but I am not sure how easily available they are.
> For ELMo, you just need to install AllenNLP.
>
> Regards,
> Jindřich
>
> ----- Original Message -----
> From: "Ignacio J. Iacobacci" <iiacobac at gmail.com>
> To: osherenko at gmx.de
> Cc: corpora at uib.no
> Sent: Friday, 19 October, 2018 11:13:42
> Subject: Re: [Corpora-List] Geometrical representation of NL phrases for
> similarity comparison
>
> Hello Alexander,
>
> There are many options, much better that this one, but doc2vec, the
> extension of word2vec for sentences and documents will work for you
> [ https://radimrehurek.com/gensim/models/doc2vec.html |
> https://radimrehurek.com/gensim/models/doc2vec.html ]
>
> All the best!
>
> Ignacio
>
>
> El vie., 19 oct. 2018 a las 10:10, Alexander Osherenko (< [ mailto:
> osherenko at gmx.de | osherenko at gmx.de ] >) escribió:
>
>
>
> Thanks, Mohammad. Unfortunately, I looking for a geometric representation
> of phrases, not of words.
>
> Best, Alexander
>
>
> Am Fr., 19. Okt. 2018 um 11:01 Uhr schrieb Mohammad Akbari < [ mailto:
> akbari.ma at gmail.com | akbari.ma at gmail.com ] >:
>
>
>
> Hello Alexander,
>
> Word embedding models, such as word2vec, and glove, are common approaches;
> where words represented with a numerical vector ( [
> https://arxiv.org/pdf/1310.4546.pdf | https://arxiv.org/pdf/1310.4546.pdf
> ] , [ https://code.google.com/archive/p/word2vec/ |
> https://code.google.com/archive/p/word2vec/ ] ). When you have word
> embedding, you can do geometric computations based other vectors. A common
> approach is to compute the average embedding of all words in a phrase; You
> can check fasttext for this purpose.
>
>
> Regards,
> Mohammad
>
>
>
>
> On 19 Oct 2018, at 09:41, Alexander Osherenko < [ mailto:osherenko at gmx.de
> | osherenko at gmx.de ] > wrote:
>
> Hi,
>
> I wonder if it is possible to represent NL phrases geometrically, for
> example, to compare their similarity. For example, the phrase "Hey man,
> that chick is such a catch! " and more formal "..., this girl is pretty!"
> should be represented geometrically nearby because they are semantically
> similar.
>
> I am aware of LSA vectors that represent particular words and similarity
> could be evaluated as a distance between these word vectors in the LSA
> space. However, the LSA approach only works for individual words and no
> phrases and it is IMHO too numerical because it doesn't consider semantics
> of participating words.
>
> Best, Alexander
> --
> Alexander Osherenko, Dr. rer. nat.
> Senior HCI architect
> Founder and R&D
> [ http://www.socioware.de/osherenko_page.html | Socioware Development ]
> Profile: [ https://www.researchgate.net/profile/Alexander_Osherenko |
> ResearchGate ]
> [
> https://www.researchgate.net/publication/327425719_Implementing_Social_Smart_Environments_with_a_Large_Number_of_Believable_Inhabitants_in_the_Context_of_Globalization
> | Implementing Social Smart Environments with a Large Number of Believable
> Inhabitants in the Context of Globalization ] at Springer
> _______________________________________________
> UNSUBSCRIBE from this page: [ http://mailman.uib.no/options/corpora |
> http://mailman.uib.no/options/corpora ]
> Corpora mailing list
> [ mailto:Corpora at uib.no | Corpora at uib.no ]
> [ https://mailman.uib.no/listinfo/corpora |
> https://mailman.uib.no/listinfo/corpora ]
>
> _______________________________________________
> UNSUBSCRIBE from this page: [ http://mailman.uib.no/options/corpora |
> http://mailman.uib.no/options/corpora ]
> Corpora mailing list
> [ mailto:Corpora at uib.no | Corpora at uib.no ]
> [ https://mailman.uib.no/listinfo/corpora |
> https://mailman.uib.no/listinfo/corpora ]
>
>
> --
> Men who become accustomed to worrying about the needs of machines become
> callous about the needs of men
> (Isaac Asimov)
>
> Ignacio J. Iacobacci
> [ mailto:iiacobac at gmail.com | iiacobac at gmail.com ]
> [ mailto:iiacobacci at dc.uba.ar | iiacobacci at dc.uba.ar ]
> [ mailto:iacobacci at di.uniroma1.it | iacobacci at di.uniroma1.it ]
>
>
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> https://mailman.uib.no/listinfo/corpora
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> https://mailman.uib.no/listinfo/corpora
>
-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 9830 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20181019/fb61a034/attachment.txt>



More information about the Corpora mailing list