[Corpora-List] Question concerning audio file search

Hong Huaqing huaqing at i2r.a-star.edu.sg
Wed Dec 20 17:33:00 CET 2006

Briony Williams wrote:
"Is there an existing Windows-based software application that will do the following?"

I haven't heard of any freely available Windows-based applications as you are looking for. We have developed a web-based query package for users to search the transcribed classroom discourse and the time-aligned audio/video data for the SCoRE (Singapore Corpus of Research in Education) corpus project. You may want to have a look at the demo version at: http://score.crpp.nie.edu.sg. Please be reminded that it is only a prototype query engine, and we are currently redesigning it.

Alternatively, as your data are mainly .trs transcripts and Transcriber segmented audio files, you may want to try ONZE Miner (http://www.ling.canterbury.ac.nz/jen/onzeminer/). However, ONZE Miner has very limited function. It can only search strings in the .trs transcripts, nor can it search any annotated info or features in the texts.

Good luck!

Huaqing Hong
National Institute of Education
Nanyang Technological University


From: owner-corpora at lists.uib.no on behalf of Briony Williams
Sent: Wed 12/20/2006 11:17 PM
To: CORPORA at uib.no
Subject: [Corpora-List] Question concerning audio file search

sato hiroaki wrote:

> I've just made a software tool for using DVD movies as a multimedia corpus.

Following the publicising of this useful-looking tool, I wonder whether any
members of the Corpora list could help me with a related task, concerning
audio rather than video files. I'm asking on behalf of someone else.

Is there an existing Windows-based software application that will do the
following? (preferably free of charge, and without a large memory requirement):-

1) Given: several very long .wav sound files (possibly an hour long), with
associated text transcript files (.trs files) as produced using the
"Transcriber" software application.

2) User input: User types one or more words to search for in the
transcription files.

3) Output: Software returns each chunk of text that contains the search
string (where "chunk" could be a phrase, sentence, paragraph, topic, or
larger, depending on the granularity of the transcription files).

4) User input: User selects one of the search results.

5) Output: Software plays back the portion of the (large) sound file
corresponding to the chunk selected by the user.

I can think of a way to do this using Cygwin and Edinburgh Speech Tools - but
does anyone know of an existing solution using a Windows graphical interface?
My contact seems to prefer a point-and-click interface if possible.

Thanks in advance for any responses.

Best regards

Briony Williams

Briony Williams

Arweinydd Tm Technoleg Lleferydd / Speech Technology Team Leader
Uned Technolegau Iaith / Language Technologies Unit
Canolfan Bedwyr / Canolfan Bedwyr
Prifysgol Cymru / University of Wales
Bangor / Bangor
Gwynedd LL57 2EN, UK / Gwynedd LL57 2EN, UK

E-Bost / E-Mail : b.williams at bangor.ac.uk
Gwe (Cymraeg) : http://www.bangor.ac.uk/ar/cb/technolegau_iaith.php.cy
Web (English) : http://www.bangor.ac.uk/ar/cb/technolegau_iaith.php.en
Ffn / Tel : +44 (0) 1506 200862
Rhithfro / Blog : http://murmur.bangor.ac.uk <http://murmur.bangor.ac.uk/>

Gall y neges e-bost hon, ac unrhyw atodiadau a anfonwyd gyda hi,
gynnwys deunydd cyfrinachol ac wedi eu bwriadu i'w defnyddio'n unig
gan y sawl y cawsant eu cyfeirio ato (atynt). Os ydych wedi derbyn y
neges e-bost hon trwy gamgymeriad, rhowch wybod i'r anfonwr ar
unwaith a dilwch y neges. Os na fwriadwyd anfon y neges atoch chi,
rhaid i chi beidio defnyddio, cadw neu ddatgelu unrhyw wybodaeth a
gynhwysir ynddi. Mae unrhyw farn neu safbwynt yn eiddo i'r sawl a'i
hanfonodd yn unig ac nid yw o anghenraid yn cynrychioli barn
Prifysgol Cymru, Bangor. Nid yw Prifysgol Cymru, Bangor yn gwarantu
bod y neges e-bost hon neu unrhyw atodiadau yn rhydd rhag firysau neu
100% yn ddiogel. Oni bai fod hyn wedi ei ddatgan yn uniongyrchol yn
nhestun yr e-bost, nid bwriad y neges e-bost hon yw ffurfio contract
rhwymol - mae rhestr o lofnodwyr awdurdodedig ar gael o Swyddfa
Cyllid Prifysgol Cymru, Bangor. www.bangor.ac.uk

This email and any attachments may contain confidential material and
is solely for the use of the intended recipient(s). If you have
received this email in error, please notify the sender immediately
and delete this email. If you are not the intended recipient(s), you
must not use, retain or disclose any information contained in this
email. Any views or opinions are solely those of the sender and do
not necessarily represent those of the University of Wales, Bangor.
The University of Wales, Bangor does not guarantee that this email or
any attachments are free from viruses or 100% secure. Unless
expressly stated in the body of the text of the email, this email is
not intended to form a binding contract - a list of authorised
signatories is available from the University of Wales, Bangor Finance
Office. www.bangor.ac.uk

------------ Institute For Infocomm Research - Disclaimer -------------
This email is confidential and may be privileged. If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its contents to any other person. Thank you.

More information about the Corpora-archive mailing list