[Corpora-List] Geometrical representation of NL phrases for similarity comparison

Alexander Osherenko osherenko at gmx.de
Fri Oct 19 17:20:12 CEST 2018


Thanks, guys, for your input -- very interesting, I will evaluate all approaches. Actually, I am looking primarily for a geometric representation of phrases that I can use for different purposes, for example, for comparison. As I saw, some approaches calculate a scholar value that can be represented as a point on the line -- it is a good beginning for my evaluation. I am curious íf there are other representations of phrases that can be visualized, for instance, as a point in the 2D/3D plane.

Am Fr., 19. Okt. 2018 um 17:06 Uhr schrieb Daniel Cer <cer at google.com>:


> Hi Alexander,
>
> You could try using the Universal Sentence Encoder:
> https://tfhub.dev/google/universal-sentence-encoder/2
>
> It performs well on sentence level semantic textual similarity. There's an
> online demo / notebook available here:
> https://colab.research.google.com/github/tensorflow/hub/blob/master/examples/colab/semantic_similarity_with_tf_hub_universal_encoder.ipynb
>
> The demo includes a simple pairwise similarity visualization as well as
> example code for using it for STS. If you want to get the best results I
> would recommend using the transformer based model
> (universal-sentence-encoder-large).
>
> Disclaimer: I'm one of the authors. We'll be presenting it during the
> EMNLP demo section later this month.
>
> Dan
>
>
>
> On Fri, Oct 19, 2018 at 3:03 AM Jindrich Libovicky <
> libovicky at ufal.mff.cuni.cz> wrote:
>
>> Hi Alexander,
>>
>> I would recommend something like ELMo: https://allennlp.org/elmo,
>> https://arxiv.org/abs/1802.05365
>>
>> It is a large pre-trained language model that works well on most of the
>> semantic tasks (https://gluebenchmark.com). There are bunch of models
>> that perform even better, but I am not sure how easily available they are.
>> For ELMo, you just need to install AllenNLP.
>>
>> Regards,
>> Jindřich
>>
>> ----- Original Message -----
>> From: "Ignacio J. Iacobacci" <iiacobac at gmail.com>
>> To: osherenko at gmx.de
>> Cc: corpora at uib.no
>> Sent: Friday, 19 October, 2018 11:13:42
>> Subject: Re: [Corpora-List] Geometrical representation of NL phrases for
>> similarity comparison
>>
>> Hello Alexander,
>>
>> There are many options, much better that this one, but doc2vec, the
>> extension of word2vec for sentences and documents will work for you
>> [ https://radimrehurek.com/gensim/models/doc2vec.html |
>> https://radimrehurek.com/gensim/models/doc2vec.html ]
>>
>> All the best!
>>
>> Ignacio
>>
>>
>> El vie., 19 oct. 2018 a las 10:10, Alexander Osherenko (< [ mailto:
>> osherenko at gmx.de | osherenko at gmx.de ] >) escribió:
>>
>>
>>
>> Thanks, Mohammad. Unfortunately, I looking for a geometric representation
>> of phrases, not of words.
>>
>> Best, Alexander
>>
>>
>> Am Fr., 19. Okt. 2018 um 11:01 Uhr schrieb Mohammad Akbari < [ mailto:
>> akbari.ma at gmail.com | akbari.ma at gmail.com ] >:
>>
>>
>>
>> Hello Alexander,
>>
>> Word embedding models, such as word2vec, and glove, are common
>> approaches; where words represented with a numerical vector ( [
>> https://arxiv.org/pdf/1310.4546.pdf | https://arxiv.org/pdf/1310.4546.pdf
>> ] , [ https://code.google.com/archive/p/word2vec/ |
>> https://code.google.com/archive/p/word2vec/ ] ). When you have word
>> embedding, you can do geometric computations based other vectors. A common
>> approach is to compute the average embedding of all words in a phrase; You
>> can check fasttext for this purpose.
>>
>>
>> Regards,
>> Mohammad
>>
>>
>>
>>
>> On 19 Oct 2018, at 09:41, Alexander Osherenko < [ mailto:osherenko at gmx.de
>> | osherenko at gmx.de ] > wrote:
>>
>> Hi,
>>
>> I wonder if it is possible to represent NL phrases geometrically, for
>> example, to compare their similarity. For example, the phrase "Hey man,
>> that chick is such a catch! " and more formal "..., this girl is pretty!"
>> should be represented geometrically nearby because they are semantically
>> similar.
>>
>> I am aware of LSA vectors that represent particular words and similarity
>> could be evaluated as a distance between these word vectors in the LSA
>> space. However, the LSA approach only works for individual words and no
>> phrases and it is IMHO too numerical because it doesn't consider semantics
>> of participating words.
>>
>> Best, Alexander
>> --
>> Alexander Osherenko, Dr. rer. nat.
>> Senior HCI architect
>> Founder and R&D
>> [ http://www.socioware.de/osherenko_page.html | Socioware Development ]
>> Profile: [ https://www.researchgate.net/profile/Alexander_Osherenko |
>> ResearchGate ]
>> [
>> https://www.researchgate.net/publication/327425719_Implementing_Social_Smart_Environments_with_a_Large_Number_of_Believable_Inhabitants_in_the_Context_of_Globalization
>> | Implementing Social Smart Environments with a Large Number of Believable
>> Inhabitants in the Context of Globalization ] at Springer
>> _______________________________________________
>> UNSUBSCRIBE from this page: [ http://mailman.uib.no/options/corpora |
>> http://mailman.uib.no/options/corpora ]
>> Corpora mailing list
>> [ mailto:Corpora at uib.no | Corpora at uib.no ]
>> [ https://mailman.uib.no/listinfo/corpora |
>> https://mailman.uib.no/listinfo/corpora ]
>>
>> _______________________________________________
>> UNSUBSCRIBE from this page: [ http://mailman.uib.no/options/corpora |
>> http://mailman.uib.no/options/corpora ]
>> Corpora mailing list
>> [ mailto:Corpora at uib.no | Corpora at uib.no ]
>> [ https://mailman.uib.no/listinfo/corpora |
>> https://mailman.uib.no/listinfo/corpora ]
>>
>>
>> --
>> Men who become accustomed to worrying about the needs of machines become
>> callous about the needs of men
>> (Isaac Asimov)
>>
>> Ignacio J. Iacobacci
>> [ mailto:iiacobac at gmail.com | iiacobac at gmail.com ]
>> [ mailto:iiacobacci at dc.uba.ar | iiacobacci at dc.uba.ar ]
>> [ mailto:iacobacci at di.uniroma1.it | iacobacci at di.uniroma1.it ]
>>
>>
>>
>> _______________________________________________
>> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
>> Corpora mailing list
>> Corpora at uib.no
>> https://mailman.uib.no/listinfo/corpora
>>
>> _______________________________________________
>> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
>> Corpora mailing list
>> Corpora at uib.no
>> https://mailman.uib.no/listinfo/corpora
>>
>
-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 10758 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20181019/0aa116e0/attachment.txt>



More information about the Corpora mailing list