[Corpora-List] Geometrical representation of NL phrases for similarity comparison

NAGOUDI El Moatez Billah e_nagoudi at esi.dz
Wed Oct 24 11:18:17 CEST 2018


Dear Alexander,

You could see:

http://aclweb.org/anthology/W17-1303 <http://aclweb.org/anthology/W17-1303> and https://www.springer.com/cda/content/document/cda_downloaddocument/9783319734996-c2.pdf?SGWID=0-0-45-1629439-p181328420

Best

Moatez

Le mar. 23 oct. 2018 à 15:51, Fabio Massimo Zanzotto < fabio.massimo.zanzotto at uniroma2.it> a écrit :


> Dear Alexander,
>
> There is another approach to encode sentences in vectors that takes into
> consideration their syntactic structures: the Distributed Tree Kernel
> <https://icml.cc/Conferences/2012/papers/111.pdf> and its semantic
> extension <http://aclweb.org/anthology/C14-1068>.
> This idea stems from the "convolutional conjecture"
> <https://doi.org/10.1162/COLI_a_00215> that underlies this kind of
> vector-based representations.
>
> Hope it helps!
>
> Best,
> Fabio
>
>
>
> On Fri, Oct 19, 2018 at 6:41 PM vinicius at open.inf.br <vinicius at open.inf.br>
> wrote:
>
>> Dear Alexander,
>>
>> If you use a vectorial representation of a sentence you will have more
>> than 50 Dimensions (e.g using word2vec vectors). I think with a 2d/3d
>> simplification you will lose a lot of information that could make difficult
>> the real representation of the sentence - even for visualization purposes.
>>
>> Best,
>> Vinicius
>>
>> On Fri, Oct 19, 2018 at 12:26 PM Alexander Osherenko <osherenko at gmx.de>
>> wrote:
>>
>>> Thanks, guys, for your input -- very interesting, I will evaluate all
>>> approaches. Actually, I am looking primarily for a geometric representation
>>> of phrases that I can use for different purposes, for example, for
>>> comparison. As I saw, some approaches calculate a scholar value that can be
>>> represented as a point on the line -- it is a good beginning for my
>>> evaluation. I am curious íf there are other representations of phrases that
>>> can be visualized, for instance, as a point in the 2D/3D plane.
>>>
>>>
>>> Am Fr., 19. Okt. 2018 um 17:06 Uhr schrieb Daniel Cer <cer at google.com>:
>>>
>>>> Hi Alexander,
>>>>
>>>> You could try using the Universal Sentence Encoder:
>>>> https://tfhub.dev/google/universal-sentence-encoder/2
>>>>
>>>> It performs well on sentence level semantic textual similarity. There's
>>>> an online demo / notebook available here:
>>>> https://colab.research.google.com/github/tensorflow/hub/blob/master/examples/colab/semantic_similarity_with_tf_hub_universal_encoder.ipynb
>>>>
>>>> The demo includes a simple pairwise similarity visualization as well as
>>>> example code for using it for STS. If you want to get the best results I
>>>> would recommend using the transformer based model
>>>> (universal-sentence-encoder-large).
>>>>
>>>> Disclaimer: I'm one of the authors. We'll be presenting it during the
>>>> EMNLP demo section later this month.
>>>>
>>>> Dan
>>>>
>>>>
>>>>
>>>> On Fri, Oct 19, 2018 at 3:03 AM Jindrich Libovicky <
>>>> libovicky at ufal.mff.cuni.cz> wrote:
>>>>
>>>>> Hi Alexander,
>>>>>
>>>>> I would recommend something like ELMo: https://allennlp.org/elmo,
>>>>> https://arxiv.org/abs/1802.05365
>>>>>
>>>>> It is a large pre-trained language model that works well on most of
>>>>> the semantic tasks (https://gluebenchmark.com). There are bunch of
>>>>> models that perform even better, but I am not sure how easily available
>>>>> they are. For ELMo, you just need to install AllenNLP.
>>>>>
>>>>> Regards,
>>>>> Jindřich
>>>>>
>>>>> ----- Original Message -----
>>>>> From: "Ignacio J. Iacobacci" <iiacobac at gmail.com>
>>>>> To: osherenko at gmx.de
>>>>> Cc: corpora at uib.no
>>>>> Sent: Friday, 19 October, 2018 11:13:42
>>>>> Subject: Re: [Corpora-List] Geometrical representation of NL phrases
>>>>> for similarity comparison
>>>>>
>>>>> Hello Alexander,
>>>>>
>>>>> There are many options, much better that this one, but doc2vec, the
>>>>> extension of word2vec for sentences and documents will work for you
>>>>> [ https://radimrehurek.com/gensim/models/doc2vec.html |
>>>>> https://radimrehurek.com/gensim/models/doc2vec.html ]
>>>>>
>>>>> All the best!
>>>>>
>>>>> Ignacio
>>>>>
>>>>>
>>>>> El vie., 19 oct. 2018 a las 10:10, Alexander Osherenko (< [ mailto:
>>>>> osherenko at gmx.de | osherenko at gmx.de ] >) escribió:
>>>>>
>>>>>
>>>>>
>>>>> Thanks, Mohammad. Unfortunately, I looking for a geometric
>>>>> representation of phrases, not of words.
>>>>>
>>>>> Best, Alexander
>>>>>
>>>>>
>>>>> Am Fr., 19. Okt. 2018 um 11:01 Uhr schrieb Mohammad Akbari < [ mailto:
>>>>> akbari.ma at gmail.com | akbari.ma at gmail.com ] >:
>>>>>
>>>>>
>>>>>
>>>>> Hello Alexander,
>>>>>
>>>>> Word embedding models, such as word2vec, and glove, are common
>>>>> approaches; where words represented with a numerical vector ( [
>>>>> https://arxiv.org/pdf/1310.4546.pdf |
>>>>> https://arxiv.org/pdf/1310.4546.pdf ] , [
>>>>> https://code.google.com/archive/p/word2vec/ |
>>>>> https://code.google.com/archive/p/word2vec/ ] ). When you have word
>>>>> embedding, you can do geometric computations based other vectors. A common
>>>>> approach is to compute the average embedding of all words in a phrase; You
>>>>> can check fasttext for this purpose.
>>>>>
>>>>>
>>>>> Regards,
>>>>> Mohammad
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 19 Oct 2018, at 09:41, Alexander Osherenko < [ mailto:
>>>>> osherenko at gmx.de | osherenko at gmx.de ] > wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> I wonder if it is possible to represent NL phrases geometrically, for
>>>>> example, to compare their similarity. For example, the phrase "Hey man,
>>>>> that chick is such a catch! " and more formal "..., this girl is pretty!"
>>>>> should be represented geometrically nearby because they are semantically
>>>>> similar.
>>>>>
>>>>> I am aware of LSA vectors that represent particular words and
>>>>> similarity could be evaluated as a distance between these word vectors in
>>>>> the LSA space. However, the LSA approach only works for individual words
>>>>> and no phrases and it is IMHO too numerical because it doesn't consider
>>>>> semantics of participating words.
>>>>>
>>>>> Best, Alexander
>>>>> --
>>>>> Alexander Osherenko, Dr. rer. nat.
>>>>> Senior HCI architect
>>>>> Founder and R&D
>>>>> [ http://www.socioware.de/osherenko_page.html | Socioware Development
>>>>> ]
>>>>> Profile: [ https://www.researchgate.net/profile/Alexander_Osherenko |
>>>>> ResearchGate ]
>>>>> [
>>>>> https://www.researchgate.net/publication/327425719_Implementing_Social_Smart_Environments_with_a_Large_Number_of_Believable_Inhabitants_in_the_Context_of_Globalization
>>>>> | Implementing Social Smart Environments with a Large Number of Believable
>>>>> Inhabitants in the Context of Globalization ] at Springer
>>>>> _______________________________________________
>>>>> UNSUBSCRIBE from this page: [ http://mailman.uib.no/options/corpora |
>>>>> http://mailman.uib.no/options/corpora ]
>>>>> Corpora mailing list
>>>>> [ mailto:Corpora at uib.no | Corpora at uib.no ]
>>>>> [ https://mailman.uib.no/listinfo/corpora |
>>>>> https://mailman.uib.no/listinfo/corpora ]
>>>>>
>>>>> _______________________________________________
>>>>> UNSUBSCRIBE from this page: [ http://mailman.uib.no/options/corpora |
>>>>> http://mailman.uib.no/options/corpora ]
>>>>> Corpora mailing list
>>>>> [ mailto:Corpora at uib.no | Corpora at uib.no ]
>>>>> [ https://mailman.uib.no/listinfo/corpora |
>>>>> https://mailman.uib.no/listinfo/corpora ]
>>>>>
>>>>>
>>>>> --
>>>>> Men who become accustomed to worrying about the needs of machines
>>>>> become callous about the needs of men
>>>>> (Isaac Asimov)
>>>>>
>>>>> Ignacio J. Iacobacci
>>>>> [ mailto:iiacobac at gmail.com | iiacobac at gmail.com ]
>>>>> [ mailto:iiacobacci at dc.uba.ar | iiacobacci at dc.uba.ar ]
>>>>> [ mailto:iacobacci at di.uniroma1.it | iacobacci at di.uniroma1.it ]
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
>>>>> Corpora mailing list
>>>>> Corpora at uib.no
>>>>> https://mailman.uib.no/listinfo/corpora
>>>>>
>>>>> _______________________________________________
>>>>> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
>>>>> Corpora mailing list
>>>>> Corpora at uib.no
>>>>> https://mailman.uib.no/listinfo/corpora
>>>>>
>>>> _______________________________________________
>>> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
>>> Corpora mailing list
>>> Corpora at uib.no
>>> https://mailman.uib.no/listinfo/corpora
>>>
>> _______________________________________________
>> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
>> Corpora mailing list
>> Corpora at uib.no
>> https://mailman.uib.no/listinfo/corpora
>>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> https://mailman.uib.no/listinfo/corpora
>

-- El Moatez Billah NAGOUDI MAA à l'Université, chahid Hamma Lakhdar d'El Oued. 102, Valmsacort, Oued el kouba, Annaba 23000. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 15719 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20181024/b74d6308/attachment.txt>



More information about the Corpora mailing list