[Corpora-List] Discourse Annotation of French Corpora: First step

Jacques Steinlin jacques.steinlin at gmail.com
Wed Feb 25 15:01:59 CET 2015


The project, realized at Alpage (Inria, France), consists of discursive annotation in the vein of the Penn Discourse Treebank (Prasad et al. 2008). Its aim is to add a discourse annotation layer to existing morpho-syntactic annotations. The corpora in question are sections of the French Sequoia Treebank: texts from wikipedia.fr and from the newspaper l'Est Républicain (Candito and Seddah, 2012, https://www.rocq.inria.fr/alpage-wiki/tiki-index.php?page=CorpusSequoia) as well as the full French Treebank, a journalistic corpus consisting of 1005 articles and 18,535 sentences (Abeillé et al. 2000, http://www.llf.cnrs.fr/fr/Gens/Abeille/French-Treebank-fr.php).

The first step was the identification of "discourse connectives" in the corpora. This involved projecting items from LexConn (a lexicon of French discourse connectives containing about 350 items) onto the corpora and keeping only those items that are used discursively. A total of 11,000 connective items have been identified and manually annotated.

The annotated corpora, the lexicon of French discourse connectives and the annotation guide are available at: https://gforge.inria.fr/frs/?group_id=6145.

Laurence Danlos, Margot Colinet and Jacques Steinlin (Alpage/Inria) -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 1574 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20150225/4c57a51d/attachment.txt>



More information about the Corpora mailing list