[Corpora-List] ELRA - Language Resources Catalogue - Update

ELRA ELDA Information info at elda.org
Tue Apr 11 18:23:47 CEST 2017


Our apologies if you have received multiple copies of this announcement.

***************************************************************** ELRA - Language Resources Catalogue - Update *****************************************************************

We are happy to announce that 1 Evaluation Package, 1 Written Corpus, 3 Desktop/Microphone Speech Resources and 1 Broadcast Speech Resource are now available in our catalogue.

*ELRA-E0046 ETAPE Evaluation Package* ISLRN: 425-777-374-455-4 <http://www.islrn.org/resources/425-777-374-455-4/>

The ETAPE Evaluation Package consists of ca. 30 hours of radio and TV data, selected to include mostly non planned speech and a reasonable proportion of multiple speaker data. All data were carefully transcribed, including named entity annotation. This package includes the material that was used for the ETAPE evaluation campaign. It includes resources, scoring tools, results of the campaign, etc., that were used or produced during the campaign. The aim of this evaluation package is to enable external players to evaluate their own system and compare their results with those obtained during the campaign itself. For more information, see: http://catalog.elra.info/product_info.php?products_id=1299

*ELRA-W0117 Danish Propbank (DPB)* ISLRN: 213-212-351-142-5 <http://www.islrn.org/resources/213-212-351-142-5/>

The Danish Propbank (DPB) is an 87,000-token treebank from a variety of genres, annotated with morphosyntactic and semantic information, namely propositions/frames with VerbNet classes and semantic roles for both arguments and satellites. There are over 12,000 frames with 32,000 role instances. The corpus has also been annotated with 20 Named Entity classes and a 200-category semantic ontology for nouns. For more information, see http://catalog.elra.info/product_info.php?products_id=1300

*ELRA-S0388 GlobalPhone Bulgarian Pronunciation Dictionary 260k entries (extended version)* ISLRN:799-402-906-876-5 <http://www.islrn.org/resources/799-402-906-876-5/>

This extended version of the Bulgarian Pronunciation Dictionary called Bulgarian-Dict260k contains pronunciations of more than 260,000 word forms. For more information, see: http://catalog.elra.info/product_info.php?products_id=1301

*ELRA-S0389 Accented English GlobalPhone* ISLRN: 574-579-221-841-3 <http://www.islrn.org/resources/574-579-221-841-3/>

The Accented English part of the GlobalPhone resources contains 63 recording sessions of Bulgarian, Chinese, German, and Indian native speakers reading 37 English sentences each, produced in GlobalPhone-style, i.e. 16kHz PCM encoded audio recordings of utterance-segmented read speech from the newspaper domain. For more information, see: http://catalog.elra.info/product_info.php?products_id=1302

*ELRA-S0390 Parallel EMG-Acoustic English GlobalPhone* ISLRN: 910-309-096-523-6 <http://www.islrn.org/resources/910-309-096-523-6/>

The parallel EMG-Acoustic English GlobalPhone language resource contains 63 recordings sessions from 8 speakers articulating speech in three speaking modes, audible, whispered, and silent by reading three times 50 English sentences in GlobalPhone-style, i.e. 16kHz PCM encoded audio recordings of utterance-segmented read speech from the newspaper domain. Speech is recorded in a parallel fashion, i.e. synchronously by a standard close-talking microphone and by surface electrodes capturing the muscle activities of the articulatory muscles in the face (EelectroMmyoGgraphy =- EMG). For more information, see: http://catalog.elra.info/product_info.php?products_id=1303

*ELRA-S0391 The FAME! Speech Corpus* ISLRN:340-994-352-616-4 <http://www.islrn.org/resources/340-994-352-616-4/>

This Frisian corpus consists of 203 audio segments of approximately 5 minutes long extracted from various radio programs covering a time span of almost 50 years (1966-2015), adding a longitudinal dimension to the database. The content of the recordings are very diverse including radio programs about culture, history, literature, sports, nature, agriculture, politics, society and languages. There are 309 identified speakers in the FAME! Speech Corpus, 21 of whom appear at least 3 times in the database. The total duration of the manually annotated radio broadcasts sums up to 18 hours, 33 minutes and 57 seconds. For more information, see: http://catalog.elra.info/product_info.php?products_id=1304

For more information on the catalogue, please contact Valérie Mapelli mailto:mapelli at elda.org

If you would like to enquire about having your resources distributed by ELRA, please do not hesitate to contact us.

Visit our On-line Catalogue: http://catalog.elra.info

Visit the Universal Catalogue: http://universal.elra.info

Archives of ELRA Language Resources Catalogue Updates: http://www.elra.info/en/catalogues/language-resources-announcements/

-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 6709 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20170411/440259c0/attachment.txt>



More information about the Corpora mailing list