[Corpora-List] ELRA - Language Resources Catalogue - Update

ELRA ELDA Information info at elda.org
Wed Dec 12 17:22:17 CET 2012


Our apologies if you have received multiple copies of this announcement.

***************************************************************** ELRA - Language Resources Catalogue - Update *****************************************************************

ELRA is happy to announce that 4 new Written Corpora are now available in its catalogue. * ELRA-W0059 LT Corpus *The LT Corpus is composed of 70 fiction texts from Portuguese renowned authors. The corpus contains 1,781,083 tokens. The texts date from before 1940. The corpus is delivered in one file, in two different formats. The txt version has one sentence per line, an identification number for each text and no further annotation. The cqpweb file is one token per line, followed by pos tag and lemma, and is annotated for NP chunks. For more information, see: http://catalog.elra.info/product_info.php?products_id=1178

*ELRA-W0060 PTPARL Corpus *The PTPARL Corpus contains 1,076 texts consisting of adapted transcriptions of the Portuguese Parliament sessions. The corpus contains 1,000,441 tokens. The corpus is delivered in one file, in two different formats. The txt version has one sentence per line, an identification number for each text and no further annotation. The cqpweb file is one token per line, followed by pos tag and lemma, and is annotated for NP chunks. For more information, see: http://catalog.elra.info/product_info.php?products_id=1179

*ELRA-W0061 CINTIL-DependencyBank *The CINTIL-DependencyBank (Silva and Branco, 2012) is a corpus of sentences annotated with their syntactic dependency graphs and grammatical function tags composed of 10,039 sentences and 110,166 tokens taken from different sources and domains: news (8,861 sentences; 101,430 tokens), novels (399 sentences; 3,082 tokens). In addition, there are 779 sentences (5,654 tokens) that are used for regression testing of the computational grammar that supported the annotation of the corpus. For more information, see: http://catalog.elra.info/product_info.php?products_id=1180

*ELRA-W0062 CINTIL-DeepBank *The CINTIL-DeepBank (Branco et al., 2010) is a corpus of sentences annotated with their full-fledged deep grammatical representations, composed of 10,039 sentences and 110,166 tokens taken from different sources and domains: news (8,861 sentences; 101,430 tokens), and novels (399 sentences; 3,082 tokens). In addition, there are 779 sentences (5,654 tokens) used for regression testing of the computational grammar that supported the annotation of the corpus. For more information, see: http://catalog.elra.info/product_info.php?products_id=1181

For more information on the catalogue, please contact Valérie Mapelli mailto:mapelli at elda.org

Visit our On-line Catalogue: http://catalog.elra.info Visit the Universal Catalogue: http://universal.elra.info Archives of ELRA Language Resources Catalogue Updates: http://www.elra.info/LRs-Announcements.html -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 5240 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20121212/73517f23/attachment.txt>



More information about the Corpora mailing list