[Corpora-List] Geometrical representation of NL phrases for similarity comparison

Mohammad Akbari akbari.ma at gmail.com
Fri Oct 19 11:01:02 CEST 2018


Hello Alexander,

Word embedding models, such as word2vec, and glove, are common approaches; where words represented with a numerical vector ( https://arxiv.org/pdf/1310.4546.pdf <https://arxiv.org/pdf/1310.4546.pdf>, https://code.google.com/archive/p/word2vec/ <https://code.google.com/archive/p/word2vec/>). When you have word embedding, you can do geometric computations based other vectors. A common approach is to compute the average embedding of all words in a phrase; You can check fasttext for this purpose.

Regards, Mohammad


> On 19 Oct 2018, at 09:41, Alexander Osherenko <osherenko at gmx.de> wrote:
>
> Hi,
>
> I wonder if it is possible to represent NL phrases geometrically, for example, to compare their similarity. For example, the phrase "Hey man, that chick is such a catch!" and more formal "..., this girl is pretty!" should be represented geometrically nearby because they are semantically similar.
>
> I am aware of LSA vectors that represent particular words and similarity could be evaluated as a distance between these word vectors in the LSA space. However, the LSA approach only works for individual words and no phrases and it is IMHO too numerical because it doesn't consider semantics of participating words.
>
> Best, Alexander
> --
> Alexander Osherenko, Dr. rer. nat.
> Senior HCI architect
> Founder and R&D
> Socioware Development <http://www.socioware.de/osherenko_page.html>
> Profile: ResearchGate <https://www.researchgate.net/profile/Alexander_Osherenko>
> Implementing Social Smart Environments with a Large Number of Believable Inhabitants in the Context of Globalization <https://www.researchgate.net/publication/327425719_Implementing_Social_Smart_Environments_with_a_Large_Number_of_Believable_Inhabitants_in_the_Context_of_Globalization> at Springer
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> https://mailman.uib.no/listinfo/corpora

-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 4917 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20181019/aa604ceb/attachment.txt>



More information about the Corpora mailing list