[Corpora-List] jobs in Cambridge: NLP for eScience

Ann Copestake Ann.Copestake at cl.cam.ac.uk
Thu May 26 09:36:04 CEST 2005


Ref No: NR115 & 116
Salary: 19,460 - 29,127 pa
Limit of tenure: Up to forty-eight months

Applications are invited for two Research Associates to develop natural
language processing technology for eScience. The project aims to develop a
natural-language oriented markup language which enables the tight integration
of partial information from a wide variety of language processing tools. This
language will be compatible with GRID and Web protocols and will have a sound
logical basis consistent with Semantic Web standards. This will be used for
robust and extensible extraction of information from scientific texts and to
model scientific argumentation and citation purpose in order to support novel
modes of information access. We will demonstrate the applicability of this
infrastructure on Chemistry texts.

The project is a collaboration between the NLIP group in the Computer
Laboratory (A. Copestake, S. Teufel: http://www.cl.cam.ac.uk/Research/NL/),the
Unilever Centre for Molecular Informatics in the Department of Chemistry (P.
Murray-Rust: http://www-ucc.ch.cam.ac.uk/) and the Cambridge eScience Centre
(A. Parker: http://www.escience.cam.ac.uk/).

This project will build on existing technology for the analysis of natural
language text and for representation of data in Chemistry texts. Proposed
start date: 1 October 2005 or as soon as possible thereafter.

Post 1 (Computer Laboratory):
Research in combining deep and shallow processing techniques, parsing of
Chemistry texts, discourse analysis, word sense disambiguation and anaphora
resolution with respect to ontologies. Coordination with Chemistry on
application of technology to Chemistry texts and with the eScience Centre on
development of high throughput techniques.

A PhD or equivalent experience in computational linguistics/ natural language
engineering is required. Relevant topics include computational semantics,
anaphora resolution, word sense disambiguation, shallow/deep parsing,
information extraction and ontology extraction. However, broad interests and
demonstrated ability to apply theoretical research with large corpora will be
an advantage.

As the research will build on an extensive existing code base mostly
implemented in C or Common Lisp, strong programming skills in these languages
are essential (also relevant are perl and C++), in a Unix/ Linux environment.
Knowledge of internet technologies would be an advantage.

Post 2 (Chemistry):
Research in chemical ontology based on XML, RDF and Semantic Web technology.
recognition and parsing of chemical terms, user interface design, coordination
with Computer Laboratory, and interaction with chemical publishers.

A PhD or equivalent experience in chemical informatics, computational
chemistry or equivalent. Experience in some of the following in chemistry:
searching, GUIs, datamining, high-throughput computation and the GRID.
Programming experience in a modern language (Java, C++) essential.

For further information/job description, please e-mail
Simone.Teufel at cl.cam.ac.uk, or visit http://www.cl.cam.ac.uk/DeptInfo/Jobs/

Applicants should send a cover letter, a completed PD18 form
(http://www.admin.cam.ac.uk/offices/personnel/forms/pd18/), a full CV, the
names and addresses of three academic/professional referees to Simone Teufel,
Computer Laboratory, JJ Thomson Avenue, Cambridge, CB3 0FD. Closing date: 20
June, 2005. Interview date: week of 4-8 July 2005.

More information about the Corpora-archive mailing list