[Corpora-List] French corpora for POS tagger evaluation

Khalid Choukri choukri at elda.org
Fri Feb 15 13:45:40 CET 2013


Dear Austina You may want to have a look at the Easy corpus, it is distributed as an evaluation package for syntactic analysis but can be used for other purposes.

Details: http://catalog.elra.info/product_info.php?products_id=1112&language=en

Here is a quick description: The EASy Evaluation Package was produced within the French national project EASy (Evaluation of syntactic parsers of French), as part of the Technolangue programme funded by the French Ministry of Research and New Technologies (MRNT). The project enabled to carry out a campaign for the evaluation of syntactic parsers of French. Here is a quick description:

A collection of syntactically tagged French texts gathered over 6 domains (about one million words) : - medicine: 100,000 words, including 5,000 annotated words, - literature: 150,000 words, including 15,000 annotated words, - emails: 2,250 anonymised personal emails (121,000 words), - general: 250,000 words, including 24,000 annotated words, extracted from Le Monde newspaper, reports from the French Senate and the European Assembly (MLCC, MultiLingual Corpora for Co-operation, catalogue ref: ELRA-W0023), - speech: 10 passages of transcribed dialogues from the Spoken French corpus (8,000 annotated words), - questions: corpus of 137,000 words, extracted from the TREC and AMARYLLIS campaigns, including 5,000 annotated words. 2) PASTK++: gathers evaluation tools for constituents and relations. It includes a version of the EASy campaign tools that were modified during the PASSAGE campaign (which followed the EASy campaigns). 3) Visualization tools for constituents and relations

Cordialement / Best regards Khalid Choukri (short message sent from IPad / message Court envoyé d'un IPad)

Le 14 févr. 2013 à 21:21, Olivier Austina <olivier.austina at gmail.com> a écrit :


> Hello,
>
> I am looking for a standard French corpora for POS tagger evaluation. Where can I download the corpus. Thanks.
>
> --
> Regards
> Austina
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 4852 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20130215/2be40808/attachment.txt>



More information about the Corpora mailing list