[Corpora-List] CLEANEVAL Web-as-Corpus exercise

Adam Kilgarriff adam at lexmasterclass.com
Tue Apr 3 18:27:00 CEST 2007

CLEANEVAL is a shared task and competitive evaluation for cleaning arbitrary
web pages, with the goal of preparing web data for use as a corpus, for
linguistic and language technology research and development. You are
invited to participate, and to encourage others to do so too.

L\devset.html> dataset now available.

* Prizes! A prize of 250.00 (GBP) will be awarded for the best
student entrant for each task (Chinese and English).
* Fuller description
L\cleaneval-overview.html> .
* Timetable:


* March 2007: Development datasets released (English and Chinese)
* June 2007: Exercise: Evaluation dataset released and, two weeks
later, participants to return cleaned pages
* end June 2007: Papers describing systems to be submitted
* Sept 15-16 2007: Workshop, part of WAC3, Louvain-la-Neuve, Belgium


* Annotation guidelines
L\annotation_guidelines.html> .
* Co-ordinators

* Marco Baroni <http://sslmit.unibo.it/~baroni/> , Trento University,
* Tony Hartley <http://www.leeds.ac.uk/cts/staff/tony_hartley.htm> ,
Leeds University, UK
* Adam Kilgarriff <http://www.kilgarriff.co.uk> , Lexical Computing
Ltd., Leeds and Sussex Universities, UK
* Serge Sharoff <http://www.comp.leeds.ac.uk/ssharoff/> , Leeds
University, UK

CLEANEVAL is an activity of ACL-SIGWAC <http://sigwac.org.uk> , the
Association for Computational Linguistics (ACL) <http://www.aclweb.org>
Special Interest Group on Web as Corpus.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://mailman.uib.no/public/corpora-archive/attachments/20070403/96bb8085/attachment.html

More information about the Corpora-archive mailing list