[Corpora-List] Workshop Call for papers: Compilation and annotation of spoken corpora: Towards best practise?

Gisle Andersen Gisle.Andersen at nhh.no
Thu Dec 13 10:32:42 CET 2012

Dear colleagues,

I am happy to announce the following pre-conference workshop at the ICAME conference<http://www.usc.es/en/congresos/icame34/index.html> in Santiago de Compostela on Wednesday 22 May 2013.

Kind regards, Gisle Andersen

ICAME34 Santiago de Compostela Pre-conference workshop / Call for papers Workshop convenors: Gisle Andersen (NHH-NO), John Kirk (QUB-UK), Susan Lee Nacey (HiHm-NO)

Compilation and annotation of spoken corpora: Towards best practise? http://www.usc.es/en/congresos/icame34/workshops.html

This workshop provides a meeting ground for scholars involved in the creation of corpora of spoken language or with a more general interested in the representation of spoken data based on audio/video recordings. The workshop addresses the need to harmonise corpus-building methods by developing or utilising internationally recognised standards in corpus linguistics or best practise guidelines for the transcription and annotation of audio/video data.

The aim is to facilitate the exchange of experience from large-scale and coordinated corpus building efforts as well as small-scale and local initiatives. This includes accounts of, on the one hand, the practicalities encountered in corpus compilation, transcription and annotation, and on the other hand, how annotation decisions are grounded in linguistic theory. This will hopefully stimulate a fruitful discussion about whether/how cross-corpora comparison is hampered by lack of uniformity in annotation schema and procedures, what solutions corpus builders recommend at different annotation levels, practical experience with the use of existing standards or de facto standards (e.g. COBUILD/NERC, TEI, XCES), methods for testing and improving inter-annotator agreement, etc. Relevant topics include, but are not restricted to:

* Corpus design (techniques for capturing and linking text and audio/video data; ensuring consistency in transcription; ensuring inter-annotator agreement) * Orthographic transcription (transcription of non-standard vocabulary, slang, swearing, neologisms; standardised vs. idiosyncratic orthography; standardised representation of pauses, backchannels and hesitation phenomena) * Annotation of syntactic features (the relevance and reliability of part-of-speech tagging for (informal/messy) conversational data; syntactic parsing of speech; parsers'/taggers' capability of handling non-standard forms and neologisms) * Annotation of prosodic, phonetic, or acoustic features (standardised vs. in-house annotation schemes, simple vs. detailed prosodic annotation; the relevance and reliability of phonetic annotation) * Pragmatic or gestural annotation (standardised/in-house systems for annotation of speech act information, discourse functions, pragmatic markers, quotatives, anaphora and deixis; gestural annotation schemes)

We invite papers that discuss specific corpus initiatives dealing with any of the above topics, or that report on corpus-based case studies which illustrate or problematise the need for methodological harmonisation and standardisation in the field. The workshop will be organised as a series of thematic slots consisting of 15-minute papers followed by joint discussions.

The deadline for abstract submission is 31 January 2013. Abstracts of 300-400 words should be submitted by e-mail to all three convenors: gisle.andersen at nhh.no<mailto:gisle.andersen at nhh.no>, jk at etinu.com<mailto:jk at etinu.com> and susan.nacey at hihm.no<mailto:susan.nacey at hihm.no>. The notification of acceptance will be sent out in late February 2013.

-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 6450 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20121213/2fa37314/attachment.txt>

More information about the Corpora mailing list