[Corpora-List] PhD Studentship, University of Edinburgh

Mirella Lapata mlap at inf.ed.ac.uk
Mon Mar 21 18:36:00 CET 2005

School of Informatics, University of Edinburgh

The Institute of Communicating and Collaborative Systems (ICCS) within
the Division of Informatics and the Human Communication Research
Centre (HCRC) invites applications for a three-year EPSRC studentship
award to commence in September 2005. The successful applicant will
work on a project aiming to devise unsupervised models for word sense
disambiguation. A brief summary of the aims of this project is given

Graphical Models for Word Sense Disambiguation

The most accurate techniques for word sense disambiguation (WSD) to
date are those which are trained on text in which each word has been
manually annotated with its intended sense. A major shortcoming of
these methods, though, is that accuracy is strongly correlated with
the quantity of training data available, and this is in short supply
because its production is very labour intensive. For many words the
distribution of their senses is highly skewed and WSD systems work
best when they take the most frequent sense into account. However, the
most frequent sense of a word is often not known, particularly in
domains (subject areas) in which no text has ever been manually

This project is concerned with developing novel algorithms for
alleviating the data requirements for large scale WSD. More
specifically the project will involve:

o Exploring the use of probabilistic graphical models for word sense
disambiguation. Graphical models are a powerful modeling framework
that is well-suited for characterizing and studying the interactions
among varied information sources, thus allowing to represent
concurrently many aspects of the WSD problem.

o devising sense ranking models for structured (e.g., WordNet) and
unstructured (e.g., dictionary definitions) sense inventories.

o Demonstrate the benefit of unsupervised WSD in application to
Question Answering.

The EPSRC baseline rate of maintenance is currently approx. 12.000
and the studentship will also pay the three years' tuition fees at
home/EU rates. Applicants should have a good honours degree or
equivalent in Computer Science or Computational
Linguistics. Programming skills, preferably in Perl, Java, C or C++,
are essential. Familiarity with statistical NLP, machine learning
methods and corpus processing is an advantage.

The project will be conducted in collaboration with the Natural
Language and Computational Linguistics (NLCL) group at the University
of Sussex (see http://www.informatics.susx.ac.uk/research/nlp/). ICCS
and HCRC have close research links with a number of other academic
institutions (e.g., Saarland University, DFKI, Stanford University)
and companies from which the student will benefit.

For further information about the project please e-mail Dr. Mirella
Lapata (mlap at inf.ed.ac.uk). Application forms and details of how to
apply are on-line at
PLEASE MARK "Graphical Models for Word Sense Disambiguation" ON THE

Application deadline: Monday May 2nd 2005.
Applications received after this deadline may be considered, but this
cannot be guaranteed.

More information about the Corpora-archive mailing list