[Corpora-List] (New book) Harispe et al: Semantic Similarity from Natural Language and Ontology Analysis

Graeme Hirst gh at cs.toronto.edu
Mon Sep 14 20:35:39 CEST 2015


Semantic Similarity from Natural Language and Ontology Analysis

by Sébastien Harispe, Sylvie Ranwez, Stefan Janaqi, and Jacky Montmain (École des mines d’Alès – LGI2P)

Synthesis Lectures on Human Language Technologies #27 (Morgan & Claypool Publishers), 2015, 256 pages


Artificial Intelligence federates numerous scientific fields in the aim of developing machines able to assist human operators performing complex treatments—most of which demand high cognitive skills (e.g. learning or decision processes). Central to this quest is to give machines the ability to estimate the likeness or similarity between things in the way human beings estimate the similarity between stimuli.

In this context, this book focuses on semantic measures: approaches designed for comparing semantic entities such as units of language, e.g. words, sentences, or concepts and instances defined into knowledge bases. The aim of these measures is to assess the similarity or relatedness of such semantic entities by taking into account their semantics, i.e. their meaning -- intuitively, the words tea and coffee, which both refer to stimulating beverage, will be estimated to be more semantically similar than the words toffee (confection) and coffee, despite that the last pair has a higher syntactic similarity. The two state-of-the-art approaches for estimating and quantifying semantic similarities/relatedness of semantic entities are presented in detail: the first one relies on corpora analysis and is based on Natural Language Processing techniques and semantic mod- els while the second is based on more or less formal, computer-readable and workable forms of knowledge such as semantic networks, thesauri or ontologies.

Semantic measures are widely used today to compare units of language, concepts, instances or even resources indexed by them (e.g., documents, genes). They are central elements of a large variety of Natural Language Processing applications and knowledge-based treatments, and have therefore naturally been subject to intensive and interdisciplinary research efforts during last decades. Beyond a simple inventory and categorization of existing measures, the aim of this monograph is to convey novices as well as researchers of these domains toward a better understanding of semantic similarity estimation and more generally semantic measures. To this end, we propose an in-depth characterization of existing proposals by discussing their features, the assumptions on which they are based and empirical results regarding their performance in particular applications. By answering these questions and by providing a detailed discussion on the foundations of semantic measures, our aim is to give the reader key knowledge required to: (i) select the more relevant methods according to a particular usage context, (ii) understand the challenges offered to this field of study, (iii) distinguish room of improvements for state-of-the-art approaches and (iv) stimulate creativity toward the development of new approaches. In this aim, several definitions, theoretical and practical details, as well as concrete applications are presented.

http://www.morganclaypool.com/doi/abs/10.2200/S00639ED1V01Y201504HLT027 <http://www.morganclaypool.com/doi/abs/10.2200/S00639ED1V01Y201504HLT027>

This title is available online without charge to members of institutions that have licensed the Synthesis Digital Library of Engineering and Computer Science. Members of licensing institutions have unlimited access to download, save, and print the PDF without restriction; use of the book as a course text is encouraged. To find out whether your institution is a subscriber, visit http://www.morganclaypool.com/page/licensed <http://www.morganclaypool.com/page/licensed>, or just click on the book's URL above from an institutional IP address and attempt to download the PDF. Others may purchase the book from this URL as a PDF download for US$30 or in print for US$40. Printed copies are also available from Amazon and from booksellers worldwide at approximately US$45 or local currency equivalent.

:::: Graeme Hirst • Series editor, Synthesis Lectures in Human Language Technologies
:::: University of Toronto • Department of Computer Science

-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 5469 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20150914/8e324e8c/attachment.txt>

More information about the Corpora mailing list