[Corpora-List] Summary of responses: Pragmatic annotations

Victoria López mavilos at terra.es
Wed Sep 14 10:42:00 CEST 2005


Two weeks ago I posted a question about pragmatic annotations. Thanks to all
of those who responded. Here's a brief summary.

'Further levels of annotation' by Geoffrey Leech, Tony McEnery and
Martin Wynne, in Corpus Annotation, edited by Roger Garside, Geoffrey
Leech and Anthony McEnery, Longman, Harlow, 2005.

ACL workshop on discourse annotation ?


Some exploratory experiments regarding
general-knowledge-based cohesion in texts:

Beigman Klebanov, B., 2005.
"Using Readers to Identify Lexical Cohesive Structures in Texts"
In Proceedings of ACL-2005 Student Session, Ann Arbor, USA, June 2005,
pp. 55-60.

The annotation guidelines we've given to the subjects can be found on my
webpage: http://www.cs.huji.ac.il/~beata

The work of Samuels et al. in COLING Montreal (1998?). it has gone quite a
way since then with lots of people joining it--below are a few references to
work at Sheffield which gets good results from rather simpler classifier
training than is usual:

Webb, N., M. Hepple and Y. Wilks (2005)
Error Analysis of Dialogue Act Classification, in Proceedings of the 8th
International Conference on Text, Speech and Dialogue, Carlsbad, Czech

Webb, N., M. Hepple and Y. Wilks (2005)
Empirical determination of thresholds for optimal dialogue act
classification, in Proceedings of the Ninth Workshop on the Semantics and
Pragmatics of Dialogue (SemDial), Nancy.

Webb, N., M. Hepple and Y. Wilks (2005)
Dialogue Act Classification using Intra-Utterance Features, in Proceedings
of the AAAI Workshop on Spoken Language Understanding, Pittsburgh.

Webb, N., H. Hardy, C. Ursu, M. Wu, T. Strzalkowski and Y. Wilks (2005)
Data-Driven Language Understanding for Spoken Language Dialogue, in
Proceedings of the AAAI Workshop on Spoken Language Understanding,
Pittsburgh, 2005.
I don't know if you count coreference as pragmatics, but you could look at
Aone and Bennett's (1994) Discourse Tagging Tool; Alembic workbench; and
Clinka. There was also a workshop at ACL on frontiers in annotation
http://nlp.cs.nyu.edu/meyers/frontiers/2005.html which might have some
useful pointers.

Popescu-Belis et al 2003, A Thematic Bibliography on Dialogue Processing.
Section 3.4 on Dialogue data and annotation.

Dhillon et al, 2004, Meeting Recorder Project: Dialogue Act Labeling Guide

Stolcke et al 2000 Doalogue Act Modeling for Automatic Tagging and
Recognition of Conversationl Speech.Computational Linguistics 26(3),

Jurafsky et al, 1997, Switchboard SWBD-DAMSL Shallow-Discourse-Function

Carletta et al 1996, HCRC Dialogue Structure Coding Manual


May I call your attention to work we at Tilburg university have done on
classifying dialogue acts in spoken dialogues.
We applied machine learning to a Dutch corpus of human-machine dialogues
conducted with a spoken dialogue system.
We used a small, domain-specific tagset that covered different aspects
of pragmatic and semantic phenomena.

You may find our related publications on
http://ilk.kub.nl/~piroska/research.htm , such as:

# P. Lendvai, A. van den Bosch: /Robust ASR lattice representation types
in pragma-semantic processing of spoken input./ In: Proc. of the AAAI
Spoken Language Understanding Workshop, SLU-2005, Pittsburgh, PA, 2005,
pages 15-22.

# P. Lendvai:/ Extracting Information from Spoken User Input. A Machine
Learning Approach./ Ph.D. thesis, Tilburg University, Netherlands, 2004.

# P. Lendvai, A. van den Bosch, E. Krahmer: /Machine Learning for Shallow
Interpretation of User Utterances in Spoken Dialogue Systems. /In: Proc.
of EACL-03 Workshop on Dialogue Systems:interaction, adaptation and
styles of management. Budapest, Hungary, 2003. pages 69-78.

# P. Lendvai, A. van den Bosch, E. Krahmer, M. Swerts: /Multi-feature
error detection. /In: Theune, M., Nijholt, A.& Hondorp, H. (Eds.),
Language and Computers: Studies in Practical Linguistics. (pp. 163-178).
Amsterdam: Rodopi. 2002.

# P. Lendvai, A. van den Bosch, E. Krahmer, M. Swerts:
/Improving machine-learned detection of miscommunications in
human-machine dialogues through informed data splitting. /In: Kuebler,
S. & Hinrichs, E. (Eds.), Machine Learning Approaches in Computational
Linguistics. (pp. 1-15). Trento, Italy: ESSLLI. 2002.


There's an article by Lampert and Ervin-Tripp in _Talking Data:
and Coding for Discourse Research__, 1993 (edited by Martin Lampert and I)
which describes principles for designing, implementing and evaluating a
system of codes (including intercoder reliability). Illustrated by
examples of coding of control acts in children.

For an array of different types of coding,
I'd recommend the deliverables from the MATE project, which are
available online.

More information about the Corpora-archive mailing list