Scanned texts contain errors introduced by imperfect OCR and other sources, so techniques are required that are robust in the face of such errors. The successful applicant will develop techniques that use typographical and contextual cues to identify and tag relevant document content.
The ideal candidate would have a PhD (or equivalent experience), and experience in one or more of the following: - natural language processing/information extraction/information retrieval, in particular from noisy data; - image analysis and feature extraction; - document layout (reverse-engineering a DTD); - XML for mark-up and term annotation; - broad familiarity with biological systematics.
Good programming skills are essential, as is the ability to learn quickly. Applications from candidates with a background in the biological sciences who can demonstrate appropriate computing skills are encouraged.
For detailed information and how to apply go to www3.open.ac.uk/employment, or email the Recruitment Secretary at MCS-Recruitment at open.ac.uk quoting the reference number. Closing date: 16th October 2008.
For enquiries about the research project, please contact: David Morse [d.r.morse at open.ac.uk].
-- Dr David R. Morse Computing Department, The Open University, Walton Hall, Milton Keynes MK7 6AA, UK. Email: D.R.Morse at open.ac.uk | Phone: +44 (0)1908 858463 | Fax: +44 (0)1908 652140
--------------------------------- The Open University is incorporated by Royal Charter (RC 000391), an exempt charity in England & Wales and a charity registered in Scotland (SC 038302).