SWAIE 2014: Semantic Web and Information Extraction http://swaie2014.wordpress.com 24th August 2014
Full-day workshop in conjunction with COLING 2014
Deadline Extension: 15 May 2014, 23:59 Hawaii Time ****************************************************************
INTRODUCTION There is a vast wealth of information available in textual format that the Semantic Web cannot yet tap into: 80% of data on the Web and on internal corporate intranets is unstructured, hence analysing and structuring the data - social analytics and next generation analytics - is a large and growing endeavour. Here, the Information Extraction community could help as they specialise in mining the nuggets of information from text. Information Extraction techniques could be enhanced by annotated data or domain-specific resources. The Semantic Web community has taken great strides in making these resources available through the Linked Open Data cloud, which are now ready for uptake by the Information Extraction community. Following the previous two SWAIE workshops at EKAW 2012 and RANLP 2013 respectively, we are focusing our attention on fostering awareness of how Semantic Web technologies can benefit the traditional IE and NLP communities. We invite contributions around three particular topics: 1) Semantic Web-driven Information Extraction, 2) Information Extraction for the Semantic Web, and 3) applications and architectures on the intersection of Semantic Web and Information Extraction.
MOTIVATION The Semantic Web aims to add a machine tractable, repurposable layer to complement the existing web of natural language hypertext. In order to realise this vision, the creation of semantic annotation, the linking of Web pages to ontologies and the creation, evolution and interrelation of ontologies must become automatic or semi-automatic processes. Information Extraction, a form of natural language analysis, is becoming a central technology to link Semantic Web models with documents. On the other hand, traditional Information Extraction can be enhanced by the addition of semantic information, enabling disambiguation of concepts, reasoning and inference to take place over the documents. The primary goal of this workshop is to advance the understanding of the relationship between Information Extraction and Semantic Web.
With the adoption of the Web 2.0 paradigm, these technologies further face new challenges because of their inherent multi-source nature, while the rapidly increasing use of social media also brings a new set of problems in dealing with degraded forms of text such as incorrect grammar, spelling and so on. Information Extraction now has to deal not just with isolated texts or single narratives but with large scale repositories or sources -- in one or many languages -- containing a multiplicity of views, opinions, or commentaries on particular topics, entities or events, in very diverse styles and formats. New methods and tools thus need to be developed to deal with the changing face of data and the changing needs of society. Furthermore, traditional platforms and architectures for Information Extraction are not necessarily capable of smooth handling of the transition to more semantic forms of annotation. While language analysis tools may not require sophisticated ontology handling mechanisms, the ensuing lack of interoperability can be problematic when embedding such tools and platforms in Semantic Web architectures. The general theme of the workshop can be seen as an extensive application area for Semantic Web technologies aimed at generating and exploiting semantically rich data, and is thus a critical area of interest to the COLING community. Furthermore, the multidisciplinary nature of the workshop will allow researchers from several distinct though highly related sub-communities to interact with respect to early ideas, work in progress and comprehensive research results.
TOPICS TO BE ADDRESSED We will welcome high-quality papers about current trends in the areas listed in the following, non-exhaustive list of topics. We will seek application-oriented, as well as more theoretical papers and position papers.
1. Semantic Web-driven Information Extraction
• Integrating ontologies/Linked Open Data with Language Resources
• Enriching Information Extraction systems with Semantic Web data/technologies
• Complex Semantic Web-driven Information Extraction tasks e.g., relation extraction, event extraction
• Methods and metrics for evaluation of semantic annotations with respect to ontologies
• Incorporating semantics into Machine Learning approaches
• Recognition and representation of temporal information and dynamics
• Data aggregation, consolidation and enrichment
• Ontology driven entity disambiguation and resolution
2. Information Extraction for the Semantic Web
• Extraction from unstructured versus semi-structured textual sources
• Dealing with the imperfections of Information Extraction techniques in the Semantic Web setting and their impact
• Multi-source or multilingual Information Extraction for ontology population
• Information extraction subtasks (e.g., terminology extraction, relation extraction, coreference resolution) for the Semantic Web
• Methods and metrics for evaluation of Information Extraction for the Semantic Web
3. Applications and Architectures Ontology-based Information Extraction for specific domains and applications, e.g. business analytics, healthcare and biomedicine, cultural heritage etc.
• Information Extraction for social media mining
• Scalability of tools and resources
• Platforms and architectures for automatic and semi-automatic semantic annotation
• Tools and methodologies for building and managing complex processing workflows
Workshop papers submission deadline: 15th May 2014 Workshop paper acceptance notification: 6th June 2014 Camera-ready deadline: 27th June 2014 Workshop: 24th August 2014
Each submission should explicitly address one or more of the three main topics and should not exceed 8 pages including references. In addition to presenting specific results, the paper should discuss the more general implications for the topics and/or subtopics that it addresses. Where feasible, contributions should include a system demonstration that illustrates the key ideas of the work and encourages interactive discussion at the workshop. All submissions must be in PDF format and must follow the COLING template. Contributions must be submitted through the START website at https://www.softconf.com/coling2014/WS-7/
There will also be an invited talk, a poster session, and an opportunity to present late-breaking work or novel ideas as a 2-minute lightning talk during the afternoon; these topics may be the stimulus for further debate during the open discussion period.
Please direct any questions regarding the workshop to brian.davis at deri.org
Diana Maynard, University of Sheffield Marieke van Erp, VU University Amsterdam Brian Davis, DERI Galway
-- Computational Lexicology & Terminology Lab (CLTL) The Network Institute, VU University Amsterdam
De Boelelaan 1105 1081 HV Amsterdam, The Netherlands http://www.mariekevanerp.com http://www.newsreader-project.eu