SemDis 2013: Current Issues in Distributional Semantics
Workshop associated with the 20th TALN conference
June 21th, 2013 Sables d'Olonne, France
In the course of the last two decades, significant progress has been made with regard to the automatic extraction of semantic knowledge from large-scale text corpora. Most work relies on Harris' distributional hypothesis of meaning, which states that words that appear within the same contexts tend to be semantically related. This principle has inspired a substantial amount of research - mainly for English but also for other languages - and several survey articles have recently helped to consolidate the concepts and procedures used for distributional computations (Sahlgren, 2006; Turney and Pantel, 2010; Baroni and Lenci, 2010). In recent years, the distributional semantic approach has benefited from the availability of massive amounts of textual data and increased computational power, allowing for the application of these methods on a large scale. Still, a number of research topics remain open, with regard to the construction, the evaluation and the application of the semantic information that is induced by these methods.
Regarding the construction of distributional semantic resources, the nature of the corpus is a key issue, and its impact on the results requires further investigation. Today's trend is to use massive corpora, moving away from Harris' initial hypothesis which was based on the analysis of small, well-defined, and specialized corpora. A second important issue relates to the modeling of semantic compositionality within a distributional framework, such that not only individual words but also larger phrases can be taken into account (Mitchell et Lapata, 2008; Baroni & Zamparelli, 2010; Grefenstette & Sadrzadeh, 2011).
Relations between words tend to be very diverse. Regarding the evaluation of distributional models, we need a better understanding of the nature of semantic relations (synonymous, associative, analogous, ...) induced by these models, and the impact of the distributional parameters on the induced relations (Sahlgren, 2006; Peirsman & Geeraerts, 2009). Secondly, large corpora generate resources so large that they are very difficult to explore and grasp. The manipulation of graphs within visualization systems suitable for their exploration can improve our knowledge on their content and structure.
Finally, distributional resources are useful for a large number applications, such as information retrieval, summarization, text segmentation, etc. Distributional features have been incorporated into a wide range of NLP tasks, such as named entity classification and paraphrasing (Kotlerman et al. 2010; Jonnalagadda et al. 2012). Linguists could equally benefit from these distributional approaches, as they provide a means to conduct large-scale studies of the semantic relations that may be discovered from large corpora.
We welcome papers that focus on any of the aforementioned topics, and in particular:
- the construction of distributional semantic resources - the nature of corpora within distributional semantics - compositionality within a distributional framework - the use of distributional resources for linguistic analysis - the induction of specific semantic relations - the use of distributional methods within NLP tasks - optimization techniques for distributional computations - visualization techniques for word spaces
- Paper submission: March 29th, 2013 - Acceptance notification: April 19th, 2013 - Final version: May 2nd, 2013
Papers should be submitted in PDF format through Easychair: https://www.easychair.org/conferences/?conf=semdis2013
Papers should be written in French or English, should count between 12 and 14 pages, and need to conform to the TALN style sheet, which is available on the conference web site (http://www.taln2013.org/soumettre/). The selection criteria are those defined for the main conference.
Cécile Fabre CLLE, Toulouse, France Nabil Hathout CLLE, Toulouse, France Philippe Muller IRIT, Toulouse, France Tim Van de Cruys IRIT, Toulouse, France
Stergos Afantenos IRIT, Toulouse, France Yves Bestgen UCL/CECL, Louvain-La-Neuve, Belgium Marie Candito ALPAGE, Paris, France Eric de la Clergerie ALPAGE, Paris, France Cécile Fabre CLLE, Toulouse, France Olivier Ferret CEA-LIST, Fontenay-aux-Roses, France Nabil Hathout CLLE, Toulouse, France Philippe Muller IRIT, Toulouse, France Adeline Nazarenko LIPN, Paris, France Pascale Sébillot IRISA, Rennes, France Ludovic Tanguy CLLE, Toulouse, France Agnès Tutin LIDILEM, Grenoble, France Tim Van de Cruys IRIT, Toulouse, France Virginie Zampa LIDILEM, Grenoble, France
Contact: Cécile Fabre (cecile.fabre at univ-tlse2.fr) and Tim Van de Cruys (tim.vandecruys at irit.fr)
-- Nabil Hathout CLLE-ERSS (UMR 5263) CNRS & Université de Toulouse-Le Mirail Maison de la Recherche. F-31058 Toulouse cedex 9 Tél. (+33) 561-503-603 Fax (+33) 561-504-677