[Corpora-List] Query about corpora of spoken English

Rayson, Paul rayson at exchange.lancs.ac.uk
Mon Dec 12 10:27:01 CET 2005


I've been told by Anne Wichmann and Gerry Knowles that the latest
version of MARSEC is held by Daniel Hirst in Aix-en-Provence:



Dr. Paul Rayson
Director of UCREL
Computing Department, Infolab21, South Drive, Lancaster University,
Lancaster, LA1 4WA, UK.
Web: http://www.comp.lancs.ac.uk/computing/users/paul/
Tel: +44 1524 510357 Fax: +44 1524 510492

-----Original Message-----
From: owner-corpora at lists.uib.no [mailto:owner-corpora at lists.uib.no] On
Behalf Of Paul Thompson
Sent: 04 December 2005 12:35
To: Briony Williams
Cc: R.M.Salkie at bton.ac.uk; CORPORA at uib.no;
nicolas.ballier at lli.univ-paris13.fr
Subject: Re: [Corpora-List] Query about corpora of spoken English

Briony Williams mentions the MARSEC corpus and gives a Reading URL for
information on this. That page is well and truly out of date now, and
Simon Arnfield, whose name is given as the contact person, no longer
works at Reading University.

Reading doesn't have any of the MARSEC resources - can anyone (maybe
someone at Leeds or Lancaster) tell the list what the current state of

Paul Thompson

Briony Williams wrote:

> R.M.Salkie at bton.ac.uk wrote:


>> My colleague Nicolas Ballier (nicolas.ballier at lli.univ-paris13.fr

>> <mailto:nicolas.ballier at lli.univ-paris13.fr> ) has asked me to post


>> following two queries. Please reply directly to him.



> It may be useful to others to have the replies in a public forum like

> this one - so here is a quick reply to the CORPORA list.


>> 1. Is there a web page which lists currently available corpora


>> spoken English (eg MARSEC MAchine REadable Spoken ENglish Corpus),

>> stating

>> whether the sound files are available?



> You could try the catalogue pages of:-


> a) Linguistic Data Consortium - subset "speech"-

> http://www.ldc.upenn.edu/Catalog/byType.jsp#speech


> b) Evaluations and Language Resources DIstribution Agency -

> http://www.elda.org/rubrique6.html


> c) International Computer Archive of Modern and Medieval English

> http://nora.hd.uib.no/whatis.html


> d) The MARSEC corpus

> http://www.rdg.ac.uk/AcaDepts/ll/speechlab/marsec/


>> 2. Is there software available to align texts and sound files:


>> example, software that enables the user to listen to any part of the

>> document by clicking on a word in the text?



> First the soundfile needs to be aligned with the linguistic

> annotation. Some popular applications currently used for doing this

> manually are the following (there are other applications for automatic

> segmentation of speech files). All of these can be used to click on

> and listen to an individual word once a word-level segmentation has

> been carried out.


> a) Praat (has a very flexible scripting language):

> http://www.fon.hum.uva.nl/praat/


> b) Emu (segment-level and also higher linguistic levels, plus

> hierarchical structure: has some scripting capability for automatic

> building of trees):

> http://emu.sourceforge.net/


> c) Transcriber ("It provides a user-friendly graphical user interface

> for segmenting long duration speech recordings, transcribing them, and

> labeling speech turns, topic changes and acoustic conditions. It is

> more specifically designed for the annotation of broadcast news

> recordings, for creating corpora used in the development of automatic

> broadcast news transcription systems, but its features might be found

> useful in other areas of speech research.")

> http://trans.sourceforge.net/en/presentation.php


> d) MATE workbench ("a program designed to aid in the display, editing

> and querying of annotated speech corpora")

> http://www.cogsci.ed.ac.uk/~dmck/MateCode/



> These are by no means the only tools available (I have omitted xlabel,

> as it is no longer supported).


> Best regards


> Briony Williams


Dr Paul Thompson
School Director of Postgraduate Studies
Department of Applied Linguistics
School of Languages and European Studies
The University of Reading
Reading RG6 6AA
Tel. +44 118 3786472
URL: www.rdg.ac.uk/app_ling/

More information about the Corpora-archive mailing list