Manchester, UK 23 or 24 August 2008 (to be determined)
Deadline for submission: 5 May 2008
Human judgements play a key role in the development and the assessment of linguistic resources and methods in Computational Linguistics. They are commonly used in the creation of lexical resources and corpus annotation, and also in the evaluation of automatic approaches to linguistic tasks. Furthermore, systematically collected human judgements provide clues for research on linguistic issues that underlie the judgement task, providing insights complementary to introspective analysis or evidence gathered from corpora.
We invite papers about experiments that collect human judgements for Computational Linguistic purposes, with a particular focus on linguistic tasks that are controversial from a theoretical point of view (e.g., some coding tasks having to do with semantics or pragmatics). Such experimental tasks are usually difficult to design and interpret, and they typically result in mediocre inter-rater reliability. We seek both broad methodological papers discussing these issues, and specific case studies.
Topic of interest include, but are not limited to:
* Experimental design:
- Which types of experiments support the collection of human
judgements? Can any general guidelines be defined? Is there a
preference between lab-based experiments and web-based
- Which experimental methodologies support controversial tasks? For
instance, does underspecification help? What is the role of
ambiguity and polysemy in these tasks?
- What is the appropriate level of granularity for the category
- What kind of participants should be used (e.g., expert
vs. non-expert), how is it affected by the type of experiment, and
how should the experiment design be varied according to this
- How much and which kind of information (examples, context, etc.)
should be provided to the experiment participants? When does
information turn into a bias?
- Is it possible to design experiments that are useful for both
computational linguistics and psycholinguistics? What do the two
research areas have in common? What are the differences?
* Analysis and interpretation of experimental data:
- How important is inter-annotator agreement in human judgement
collection experiments? How is it best measured for complex tasks?
- What other quantitative tools are useful for analysing human
judgement collection experiments?
- What qualitative methods are useful for analysing human judgement
collection experiments? Which questions should be asked? Is it
possible to formulate general guidelines?
- How is the analysis similar to psycholinguistic analysis? How is
- How do results from all of the methods above affect the
development of annotation instructions and procedures?
* Application of experiment insights:
- How do the experimental data fit into the general
- How to modify the set of labels and the criteria or guidelines for
the annotation task according to the experimental results? How to
avoid circularity in this process?
- How can the data be used to refine or modify existing theoretical
- More generally, under what conditions can the obtained judgements
be applied to research questions?
Ron Artstein, Institute for Creative Technologies, University of Southern California Gemma Boleda, Universitat Politècnica de Catalunya Frank Keller, University of Edinburgh Sabine Schulte im Walde, Universität Stuttgart
Martha Palmer, University of Colorado
Toni Badia, Universitat Pompeu Fabra Marco Baroni, University of Trento Beata Beigman Klebanov, Northwestern University André Blessing, Universität Stuttgart Chris Brew, Ohio State University Kevin Cohen, University of Colorado Health Sciences Center Barbara Di Eugenio, University of Illinois at Chicago Katrin Erk, University of Texas at Austin Stefan Evert, University of Osnabrück Afsaneh Fazly, University of Toronto Alex Fraser, Universität Stuttgart Jesus Gimenez, Universitat Politècnica de Catalunya Roxana Girju, University of Illinois at Urbana-Champaign Ed Hovy, University of Southern California Nancy Ide, Vassar College Adam Kilgarriff, University of Brighton Alexander Koller, University of Edinburgh Anna Korhonen, University of Cambridge Mirella Lapata, University of Edinburgh Diana McCarthy, University of Sussex Alissa Melinger, University of Dundee Paola Merlo, University of Geneva Sebastian Padó, Stanford University Martha Palmer, University of Colorado Rebecca Passonneau, Columbia University Massimo Poesio, University of Trento Sameer Pradhan, BBN Technologies Horacio Rodriguez, Universitat Politècnica de Catalunya Bettina Schrader, Universität Potsdam Suzanne Stevenson, University of Toronto
Deadline for the receipt of papers is 5 May 2008, 23:59 UTC. For submission information see the following web page:
Paper submission deadline: 5 May 2008 Notification of acceptance: 10 June 2008 Camera-ready copy due: 1 July 2008 Workshop date: 23 or 24 August 2008 (to be determined)