Title: Hybrid Question Answering over Heterogeneous Data Laboratories: LRI, CNRS UMR 8623, Université Paris-Saclay, France LIMSI, CNRS, Université Paris-Saclay, France Supervisors: Brigitte Grau (LIMSI); Yue Ma (LRI) Project context: GoASQ (ANR international project with TU Dresden, Germany) Financial support: ANR Start Date: as soon as possible (latest December 1, 2016) Duration: three years Application Deadline: June 25, 2016
***Motivations*** More and more information on individuals (e.g., persons, events, biological objects) are available electronically in a structured or semi-structured form. However, selecting individuals satisfying certain complex constraints manually is a complex, error-prone, and time and personnel-consuming effort. To this end, tools that can (semi-)automatically answer questions based on heterogeneous data need to be developed, as exampled by IBM Watson system. This Ph.D project is to deal with instance extraction problem for applications that involve rich background domain knowledge, such as searching electronic patient records for eligible patients satisfying non-trivial combinations of certain properties, e.g., eligibility criteria for clinical trials. We name this task complex question answering. While simple questions can directly be expressed and answered using keywords in natural language, complex questions that can refer to type and relational information will increase the precision of retrieved results, and thus reduce the effort for posterior manual verification of the results. Formal queries are powerful in this context, in representing complex questions and exploring background knowledge; however they are often difficult to master, which makes such an advanced answering system impractical if without a user adapted interface. To resolve the problem, this PhD project is to provide a user with the possibility to formulate her need with natural language questions that can be complex pieces of texts. Apart from this easier interface, natural language will enable us to formulate constraints that cannot be represented formally due to the expressiveness limits of formal languages, but that can be directly verified using textual data.
***Ph.D Work*** To achieve the complex question answering, this PhD project is to develop a novel answering question paradigm that integrates both formal database-like query answering and texts based question answering by information extraction methods. This is because these are two important approaches for complex question answering, but of each own advantages. To benefit from both methods, a key contribution of this PhD work will be the approaches for combining answers to a formal query with answers found based on information retrieval techniques, which has been identified as a challenge in question answering systems. It is to study the hybrid complex question answering systems by taking into account the limits of both ontological reasoning and text processing approaches alone. In particular, the following approaches need to be developed: - Text-for-ontology search: selecting relevant cases by text-based retrieval for defining a subset of individuals to reduce the calculation complexity of formal queries. - Ontology driven search: querying the populated ontology for selecting potential relevant individuals and related texts, and reranking these individuals by verifying remaining unstructured information on them. - Hybrid answer production: producing final answers to a question by comparing and then combining the results from ontology based reasoning method and text based processing method.
***Required profile*** Master in Computer Science or related domain Knowledge in Semantic Web, Information Extraction, and/or Artificial Intelligence is required. Background in Natural Language Processing, Automatic Reasoning or Information Retrieval is desired. Programming: Java, python Language: good English level, French is not required Ability to work in team, motivation on multidiscipline studies
***Documents required for application*** CV, motivation letter, and recommendation letters Transcripts for Master and undergraduate courses
Please send your applications to brigitte.grau_at_limsi.fr and yue.ma_at_lri.fr as soon as possible.
-- http://perso.limsi.fr/Individu/bg/ Groupe ILES - LIMSI Bât. 508, rue John von Neumann 91405 ORSAY Cedex tel. 01 69 85 80 03, fax 01 69 85 80 88
ENSIIE Ecole Nationale Supérieure d'Informatique pour l'Industrie et l'Entreprise 1 square de la résistance, 91025 EVRY Cedex tel. 01 69 36 73 44, fax 01 69 36 73 09