[Corpora-List] Geometrical representation of NL phrases for similarity comparison

vinicius at open.inf.br vinicius at open.inf.br
Fri Oct 19 23:35:16 CEST 2018


Dear Alexander,

If you use a vectorial representation of a sentence you will have more than 50 Dimensions (e.g using word2vec vectors). I think with a 2d/3d simplification you will lose a lot of information that could make difficult the real representation of the sentence - even for visualization purposes.

Best, Vinicius

On Fri, Oct 19, 2018 at 12:26 PM Alexander Osherenko <osherenko at gmx.de> wrote:


> Thanks, guys, for your input -- very interesting, I will evaluate all
> approaches. Actually, I am looking primarily for a geometric representation
> of phrases that I can use for different purposes, for example, for
> comparison. As I saw, some approaches calculate a scholar value that can be
> represented as a point on the line -- it is a good beginning for my
> evaluation. I am curious íf there are other representations of phrases that
> can be visualized, for instance, as a point in the 2D/3D plane.
>
>
> Am Fr., 19. Okt. 2018 um 17:06 Uhr schrieb Daniel Cer <cer at google.com>:
>
>> Hi Alexander,
>>
>> You could try using the Universal Sentence Encoder:
>> https://tfhub.dev/google/universal-sentence-encoder/2
>>
>> It performs well on sentence level semantic textual similarity. There's
>> an online demo / notebook available here:
>> https://colab.research.google.com/github/tensorflow/hub/blob/master/examples/colab/semantic_similarity_with_tf_hub_universal_encoder.ipynb
>>
>> The demo includes a simple pairwise similarity visualization as well as
>> example code for using it for STS. If you want to get the best results I
>> would recommend using the transformer based model
>> (universal-sentence-encoder-large).
>>
>> Disclaimer: I'm one of the authors. We'll be presenting it during the
>> EMNLP demo section later this month.
>>
>> Dan
>>
>>
>>
>> On Fri, Oct 19, 2018 at 3:03 AM Jindrich Libovicky <
>> libovicky at ufal.mff.cuni.cz> wrote:
>>
>>> Hi Alexander,
>>>
>>> I would recommend something like ELMo: https://allennlp.org/elmo,
>>> https://arxiv.org/abs/1802.05365
>>>
>>> It is a large pre-trained language model that works well on most of the
>>> semantic tasks (https://gluebenchmark.com). There are bunch of models
>>> that perform even better, but I am not sure how easily available they are.
>>> For ELMo, you just need to install AllenNLP.
>>>
>>> Regards,
>>> Jindřich
>>>
>>> ----- Original Message -----
>>> From: "Ignacio J. Iacobacci" <iiacobac at gmail.com>
>>> To: osherenko at gmx.de
>>> Cc: corpora at uib.no
>>> Sent: Friday, 19 October, 2018 11:13:42
>>> Subject: Re: [Corpora-List] Geometrical representation of NL phrases for
>>> similarity comparison
>>>
>>> Hello Alexander,
>>>
>>> There are many options, much better that this one, but doc2vec, the
>>> extension of word2vec for sentences and documents will work for you
>>> [ https://radimrehurek.com/gensim/models/doc2vec.html |
>>> https://radimrehurek.com/gensim/models/doc2vec.html ]
>>>
>>> All the best!
>>>
>>> Ignacio
>>>
>>>
>>> El vie., 19 oct. 2018 a las 10:10, Alexander Osherenko (< [ mailto:
>>> osherenko at gmx.de | osherenko at gmx.de ] >) escribió:
>>>
>>>
>>>
>>> Thanks, Mohammad. Unfortunately, I looking for a geometric
>>> representation of phrases, not of words.
>>>
>>> Best, Alexander
>>>
>>>
>>> Am Fr., 19. Okt. 2018 um 11:01 Uhr schrieb Mohammad Akbari < [ mailto:
>>> akbari.ma at gmail.com | akbari.ma at gmail.com ] >:
>>>
>>>
>>>
>>> Hello Alexander,
>>>
>>> Word embedding models, such as word2vec, and glove, are common
>>> approaches; where words represented with a numerical vector ( [
>>> https://arxiv.org/pdf/1310.4546.pdf |
>>> https://arxiv.org/pdf/1310.4546.pdf ] , [
>>> https://code.google.com/archive/p/word2vec/ |
>>> https://code.google.com/archive/p/word2vec/ ] ). When you have word
>>> embedding, you can do geometric computations based other vectors. A common
>>> approach is to compute the average embedding of all words in a phrase; You
>>> can check fasttext for this purpose.
>>>
>>>
>>> Regards,
>>> Mohammad
>>>
>>>
>>>
>>>
>>> On 19 Oct 2018, at 09:41, Alexander Osherenko < [ mailto:
>>> osherenko at gmx.de | osherenko at gmx.de ] > wrote:
>>>
>>> Hi,
>>>
>>> I wonder if it is possible to represent NL phrases geometrically, for
>>> example, to compare their similarity. For example, the phrase "Hey man,
>>> that chick is such a catch! " and more formal "..., this girl is pretty!"
>>> should be represented geometrically nearby because they are semantically
>>> similar.
>>>
>>> I am aware of LSA vectors that represent particular words and similarity
>>> could be evaluated as a distance between these word vectors in the LSA
>>> space. However, the LSA approach only works for individual words and no
>>> phrases and it is IMHO too numerical because it doesn't consider semantics
>>> of participating words.
>>>
>>> Best, Alexander
>>> --
>>> Alexander Osherenko, Dr. rer. nat.
>>> Senior HCI architect
>>> Founder and R&D
>>> [ http://www.socioware.de/osherenko_page.html | Socioware Development ]
>>> Profile: [ https://www.researchgate.net/profile/Alexander_Osherenko |
>>> ResearchGate ]
>>> [
>>> https://www.researchgate.net/publication/327425719_Implementing_Social_Smart_Environments_with_a_Large_Number_of_Believable_Inhabitants_in_the_Context_of_Globalization
>>> | Implementing Social Smart Environments with a Large Number of Believable
>>> Inhabitants in the Context of Globalization ] at Springer
>>> _______________________________________________
>>> UNSUBSCRIBE from this page: [ http://mailman.uib.no/options/corpora |
>>> http://mailman.uib.no/options/corpora ]
>>> Corpora mailing list
>>> [ mailto:Corpora at uib.no | Corpora at uib.no ]
>>> [ https://mailman.uib.no/listinfo/corpora |
>>> https://mailman.uib.no/listinfo/corpora ]
>>>
>>> _______________________________________________
>>> UNSUBSCRIBE from this page: [ http://mailman.uib.no/options/corpora |
>>> http://mailman.uib.no/options/corpora ]
>>> Corpora mailing list
>>> [ mailto:Corpora at uib.no | Corpora at uib.no ]
>>> [ https://mailman.uib.no/listinfo/corpora |
>>> https://mailman.uib.no/listinfo/corpora ]
>>>
>>>
>>> --
>>> Men who become accustomed to worrying about the needs of machines become
>>> callous about the needs of men
>>> (Isaac Asimov)
>>>
>>> Ignacio J. Iacobacci
>>> [ mailto:iiacobac at gmail.com | iiacobac at gmail.com ]
>>> [ mailto:iiacobacci at dc.uba.ar | iiacobacci at dc.uba.ar ]
>>> [ mailto:iacobacci at di.uniroma1.it | iacobacci at di.uniroma1.it ]
>>>
>>>
>>>
>>> _______________________________________________
>>> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
>>> Corpora mailing list
>>> Corpora at uib.no
>>> https://mailman.uib.no/listinfo/corpora
>>>
>>> _______________________________________________
>>> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
>>> Corpora mailing list
>>> Corpora at uib.no
>>> https://mailman.uib.no/listinfo/corpora
>>>
>> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> https://mailman.uib.no/listinfo/corpora
>
-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 11911 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20181019/05bd087d/attachment.txt>



More information about the Corpora mailing list