[Corpora-List] DEMOCRAT, a freely available (co)reference-level annotated corpus

Frederic Landragin frederic.landragin at ens.fr
Wed Jun 12 17:13:32 CEST 2019


We are pleased to announce that the DEMOCRAT corpus, a reference and coreference annotated corpus for French, is now available on-line with CC BY-NC-SA 3.0 license on the Ortolang platform (Open Resources and TOols for LANGuage): https://www.ortolang.fr/market/corpora/democrat

It is a corpus of written French (689,000 words) that includes about fifty excerpts from texts from various periods, half of them literary and half taken from non-narrative textual genres. All centuries from the 11th to the 21st are covered, which allows for diachronic analyses. The number of annotations (198,000 referring expressions, and 20,000 coreference chains grouping 2 or more expressions, including 9,000 reference chains grouping 3 or more expressions) was intended so as to allow for NLP exploitations of the corpus. All information, licenses and involved contributors are listed on the download web page.

For the ANR DEMOCRAT project, Frédéric Landragin. ___________________________________________________________________ Frederic Landragin CNRS - Laboratoire Lattice - 1 rue Maurice Arnoux - 92120 Montrouge http://www.lattice.cnrs.fr/Frederic-Landragin/



More information about the Corpora mailing list