[Corpora-List] June 2017 Newsletter -- LDC

Penn LDC ldc at ldc.upenn.edu
Fri Jun 16 15:13:53 CEST 2017


New publications:

Abstract Meaning Representation (AMR) Annotation Release 2.0<https://catalog.ldc.upenn.edu/LDC2017T10>

CHiME2 WSJ0<https://catalog.ldc.upenn.edu/LDC2017S10>

UCLA High-Speed Laryngeal Video and Audio<https://catalog.ldc.upenn.edu/LDC2017V01>

___________________________________________________ (1) Abstract Meaning Representation (AMR) Annotation Release 2.0<https://catalog.ldc.upenn.edu/LDC2017T10> was developed by LDC, SDL/Language Weaver, Inc.<http://www.sdl.com/>, the University of Colorado's Computational Language and Educational Research<http://clear.colorado.edu/start/index.html> group and the Information Sciences Institute<http://www.isi.edu/home> at the University of Southern California. It contains a sembank (semantic treebank) of over 39,260 English natural language sentences from broadcast conversations, newswire, weblogs and web discussion forums. AMR captures "who is doing what to whom" in a sentence. Each sentence is paired with a graph that represents its whole-sentence meaning in a tree-structure. AMR utilizes PropBank frames, non-core semantic roles, within-sentence coreference, named entity annotation, modality, negation, questions, quantities, and so on to represent the semantic structure of a sentence largely independent of its syntax. LDC also released Abstract Meaning Representation (AMR) Annotation Release 1.0 (LDC2014T12<https://catalog.ldc.upenn.edu/LDC2014T12>). Abstract Meaning Representation (AMR) Annotation Release 2.0 is distributed via web download. 2017 Subscription Members will automatically receive copies of this corpus. 2017 Standard Members may request a copy as part of their 16 free membership corpora. Non-members may license this data for a fee. * (2) CHiME2 WSJ0<https://catalog.ldc.upenn.edu/LDC2017S10> was developed as part of The 2nd CHiME Speech Separation and Recognition Challenge<http://spandh.dcs.shef.ac.uk/chime_challenge/chime2013/> and contains approximately 166 hours of English speech from a noisy living room environment. The CHiME Challenges focus on distant-microphone automatic speech recognition (ASR) in real-world environments. CHiME2 WSJ0 reflects the medium vocabulary track<http://spandh.dcs.shef.ac.uk/chime_challenge/chime2013/chime2_task2.html> of the CHiME2 Challenge. The target utterances were taken from CSR-I (WSJ0) Complete (LDC93S6A<https://catalog.ldc.upenn.edu/LDC93S6A/>), specifically, the 5,000 word subset of read speech from Wall Street Journal news text. Data is divided into training, development and test sets and includes baseline scoring, decoding and retraining tools. CHiME2 WSJ0 is distributed via web download. 2017 Subscription Members will automatically receive copies of this corpus. 2017 Standard Members may request a copy as part of their 16 free membership corpora. Non-members may license this data for a fee.

*

(3) UCLA High-Speed Laryngeal Video and Audio<https://catalog.ldc.upenn.edu/LDC2017V01> was developed by UCLA Speech Processing and Auditory Perception Laboratory<http://www.seas.ucla.edu/spapl/index.html> and is comprised of high-speed laryngeal video recordings of the vocal folds and synchronized audio recordings form nine subjects collected between April 2012 and April 2013. Speakers were asked to sustain the vowel /i/ for approximately ten seconds while holding voice quality, fundamental frequency, and loudness as steady as possible.

In the field of speech production theory, data such as contained in this release may be used to study the relationship between vocal folds vibration and resulting voice quality.

None of the subjects had a history of a voice disorder. There was no native language requirement for recruiting subjects; participants were native speakers of various languages, including English, Mandarin Chinese, Taiwanese Mandarin, Cantonese and German. UCLA High-Speed Laryngeal Video and Audio is distributed via hard drive. 2017 Subscription Members will automatically receive copies of this corpus. 2017 Standard Members may request a copy as part of their 16 free membership corpora. Non-members may license this data for a fee.

Membership Office Linguistic Data Consortium<ldc.upenn.edu> University of Pennsylvania T: +1-215-573-1275 E: ldc at ldc.upenn.edu<mailto:ldc at ldc.upenn.edu> M: 3600 Market St. Suite 810

Philadelphia, PA 19104

-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 10701 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20170616/cf55cf2a/attachment.txt>



More information about the Corpora mailing list