It is a corpus of written French (689,000 words) that includes about fifty excerpts from texts from various periods, half of them literary and half taken from non-narrative textual genres. All centuries from the 11th to the 21st are covered, which allows for diachronic analyses. The number of annotations (198,000 referring expressions, and 20,000 coreference chains grouping 2 or more expressions, including 9,000 reference chains grouping 3 or more expressions) was intended so as to allow for NLP exploitations of the corpus. All information, licenses and involved contributors are listed on the download web page.
For the ANR DEMOCRAT project, Frédéric Landragin. ___________________________________________________________________ Frederic Landragin CNRS - Laboratoire Lattice - 1 rue Maurice Arnoux - 92120 Montrouge http://www.lattice.cnrs.fr/Frederic-Landragin/