[Corpora-List] ELRA - Language Resources Catalogue - Update

ELRA ELDA Information info at elda.org
Thu Apr 14 16:47:57 CEST 2016


[Our apologies if you have received multiple copies of this announcement.]

We are happy to announce that a set of Pashto Language Resources (1 Broadcast Speech Resource and 6 Written Corpora) and 1 new Multimodal Resource are now available in our catalogue.

*_Pashto Language Resources:_* This set of Pashto Language Resources was produced by ELDA within the PEA TRAD project supported by the French Ministry of Defence (DGA). It consists of 1 Broadcast Speech Resource and 6 Written Corpora. Available resources are listed below (click on the links for further details):

*ELRA-S0381 TRAD Pashto Broadcast News Speech Corpus* *ISLRN: **918-508-885-913-7 * <http://islrn.org/resources/918-508-885-913-7/> This corpus contains 108 hours of broadcast news recordings transcribed, covering more than 1,000 speakers. Transcriptions are provided together with the audio files and include about 46,000 segments and 1.1M words. For more information, see: http://catalog.elra.info/product_info.php?products_id=1265

*ELRA-W0092 TRAD Pashto Monolingual text Corpus* *ISLRN: **394-903-293-388-0* <http://islrn.org/resources/394-903-293-388-0/> This is a monolingual text corpus in Pashto. The corpus contains about 112,000,000 tokens collected from 46 different blogs and websites. For more information, see: http://catalog.elra.info/product_info.php?products_id=1266

*ELRA-W0093 TRAD Pashto-French Parallel corpus of transcribed Broadcast News Speech - Training data* *ISLRN: **802-643-297-429-4 <http://islrn.org/resources/802-643-297-429-4/>* This corpus consists of the transcription of 106 hours of recordings in Pashto from the TRAD Pashto Broadcast News Speech Corpus (ELRA-S0381) translated into French. It contains about 832,000 source words and 747,000 target words. For more information, see: http://catalog.elra.info/product_info.php?products_id=1267

*ELRA-W0094 TRAD Pashto-French Parallel corpus of transcribed Broadcast News Speech - Test data* *ISLRN: **547-897-479-723-3 * <http://islrn.org/resources/547-897-479-723-3/> This is a parallel corpus, which contains 10,000 Pashto words translated into French. The source texts come from 3 broadcast news transcriptions of the TRAD Pashto Broadcast News Speech Corpus (ELRA-S0381). For more information, see: http://catalog.elra.info/product_info.php?products_id=1268

*ELRA-W0095 TRAD Pashto-English Parallel corpus of transcribed Broadcast News Speech - Test data* *ISLRN: **006-102-605-738-4* <http://islrn.org/resources/006-102-605-738-4/> This is a parallel corpus, which contains 10,000 Pashto words translated into English. The source texts come from 3 broadcast news transcriptions of the TRAD Pashto Broadcast News Speech Corpus (ELRA-S0381). For more information, see: http://catalog.elra.info/product_info.php?products_id=1269

*ELRA-W0096 TRAD Pashto-French News Articles Parallel corpus* *ISLRN: 649-628-149-051-7 <http://islrn.org/resources/649-628-149-051-7/> *This is a parallel corpus, which contains 10,000 Pashto words translated into French by two different translators. The source texts have been collected from the following news websites: Azadiradio, Mashaal and Voice of America Pashto.* * For more information, see: http://catalog.elra.info/product_info.php?products_id=1270

*ELRA-W0097 TRAD Pashto-English News Articles Parallel corpus* *ISLRN: 612-936-517-010-2 <http://islrn.org/resources/612-936-517-010-2/> *This is a parallel corpus, which contains 10,000 Pashto words translated into English by two different translators. The source texts have been collected from the following news websites: Azadiradio, Mashaal and Voice of America Pashto.* * For more information, see:**http://catalog.elra.info/product_info.php?products_id=1271

*ELRA-S0374 FoxPersonTracks: a Benchmark for Person Re-Identification from TV Broadcast Shows* *ISLRN: **168-132-570-218-1* <http://islrn.org/resources/168-132-570-218-1/> FoxPersonTracks is a person track dataset dedicated to person re-identification. The dataset is built from a set of real life TV shows broadcasted from BFMTV and LCP TV french channels, provided during REPERE challenge. It contains a total 4,604 persontracks (short video sequences featuring an individual with no background) from 266 persons. The dataset also provides re-identification results using space-time histograms as a baseline, together with an evaluation tool in order to ease the comparison to other re- identification methods. For more information, see: http://catalog.elra.info/product_info.php?products_id=1264

For more information on the catalogue, please contact Valérie Mapelli mailto:mapelli at elda.org

If you would like to enquire about having your resources distributed by ELRA, please do not hesitate to contact us.

Visit our On-line Catalogue: http://catalog.elra.info Visit the Universal Catalogue: http://universal.elra.info Archives of ELRA Language Resources Catalogue Updates: http://www.elra.info/en/catalogues/language-resources-announcements/ -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 9403 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20160414/612dc9aa/attachment.txt>



More information about the Corpora mailing list