[Corpora-List] ELRA - Language Resources Catalogue - Update

ELRA ELDA Information info at elda.org
Thu Feb 14 10:45:27 CET 2013


Our apologies if you have received multiple copies of this announcement.

***************************************************************** ELRA - Language Resources Catalogue - Update *****************************************************************

ELRA is happy to announce that QUAERO Structured Named Entity Language Resources are now available in its catalogue. A Written Corpus and a Broadcast Resource annotated with Structured Named Entities from the QUAERO Programme are now being released (free for academic research):

*ELRA-W0073 Quaero Old Press Extended Named Entity corpus* This corpus consists of the manual annotation of 76 newspaper issues published in 1890-1891 and provided by the French National Library (Bibliothèque Nationale de France). Three different titles are used (Le Temps, La Croix and Le Figaro) for a total of 295 pages. The corpus is fully manually annotated according to the Quaero extended and structured named entity definition. For more information, see: http://catalog.elra.info/product_info.php?products_id=1194&language=en

*ELRA-S0349 Quaero Broadcast News Extended Named Entity corpus* This corpus consists of the manual annotation of (i) the ESTER 2 (see also ELRA-S0338) manual transcription corpus and (ii) the Quaero Speech Recognition Evaluation corpus (manual and automatic transcriptions coming from 3 different ASR systems). The corpus is fully manually annotated according to the Quaero extended and structured named entity definition. For more information, see: http://catalog.elra.info/product_info.php?products_id=1195&language=en

These two corpora are described in : S. Rosset, C. Grouin, K. Fort, O. Galibert, J. Kahn, P. Zweigenbaum. Structured Named Entities in two distinct press corpora: Contemporary Broadcast News and Old Newspapers. In Proc. of LAW VI, 2012.

QUAERO is a research and innovation program adressing automatic processing of multimedia and multilingual content aiming at the development of new tools for navigating in large volumes of text and audiovisual content. The research and development undertaken covers automatic information retrieval, analysis, segmentation and classification of text, speech, music, image and video. The program, supported by OSEO, gathers 32 French and German partners -- large groups, small and medium size enterprises, research laboratories and public organizations. The program consists of a number of application projects aiming at industrial targets and markets that are supported by a common shared research structure. Real world data sets (corpora) are used to define the evaluation tasks and to conduct the research challenges between partners. The use of systematic periodic technology evaluation allows to assess progress made and to select the most promising technical and scientific approaches. After nearly five years of existence, Quaero is a very active eco-system that has produced in excess of 700 scientific publications, more than 25 awards, numerous top 3 rankings in technology evaluation campaigns, 31 patent applications and many innovative prototypes.

To find out more about QUAERO, please visit the following website: http://www.quaero.org

For more information on the catalogue, please contact Valérie Mapelli mailto:mapelli at elda.org

Visit our On-line Catalogue: http://catalog.elra.info Visit the Universal Catalogue: http://universal.elra.info Archives of ELRA Language Resources Catalogue Updates: http://www.elra.info/LRs-Announcements.html -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 4693 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20130214/f668552f/attachment.txt>



More information about the Corpora mailing list