Fw: [Corpora-List] Resend: CLEANEVAL Web-as-Corpus exercise

Senta Setinc senta.setinc at triera.net
Wed Apr 4 00:31:00 CEST 2007


----- Original Message -----
From: "Senta Setinc" <senta.setinc at triera.net>
To: "Adam Kilgarriff" <adam at lexmasterclass.com>
Sent: Tuesday, April 03, 2007 11:49 PM
Subject: Re: [Corpora-List] Resend: CLEANEVAL Web-as-Corpus exercise



> Forgive me for sending you this information, which is much, much less

> important than Adam's : For those who have experienced problems with

> overload on their harddisks, there is a wonderful new (not for all of you,

I

> am sure) cleaning software - a free tool, named CClenaer (Crap Cleaner).

You

> can download it from here: http://www.ccleaner.com

> In only a matter of seconds I gained about 1,3 Giga Bytes of free space.

> Amazing, really.

>

> All the best to all, Senta

> ----- Original Message -----

> From: "Adam Kilgarriff" <adam at lexmasterclass.com>

> To: <sigwac at sslmit.unibo.it>; <corpora at hd.uib.no>

> Sent: Tuesday, April 03, 2007 6:56 PM

> Subject: [Corpora-List] Resend: CLEANEVAL Web-as-Corpus exercise

>

>

> **Apologies for faulty links in last version**

>

> CLEANEVAL is a shared task and competitive evaluation for cleaning

arbitrary

> web pages, with the goal of preparing web data for use as a corpus, for

> linguistic and language technology research and development. You are

> invited to participate, and to encourage others to do so too.

>

> Website: http://cleaneval.sigwac.org.uk

>

> Development dataset now available.

>

> * Prizes! A prize of 250.00 (GBP) will be awarded for the best

> student entrant for each task (Chinese and English).

> * Timetable:

> * March 2007: Development datasets released (English and Chinese)

> * June 2007: Exercise: Evaluation dataset released and, two weeks

> later, participants to return cleaned pages

> * end June 2007: Papers describing systems to be submitted

> * Sept 15-16 2007: Workshop, part of WAC3, Louvain-la-Neuve, Belgium

> http://cental.fltr.ucl.ac.be/wac3/

>

> * Co-ordinators

> * Marco Baroni, Trento University, Italy

> * Tony Hartley, Leeds University, UK

> * Adam Kilgarriff, Lexical Computing Ltd., Leeds and Sussex Univs, UK

> * Serge Sharoff, Leeds University, UK

>

> CLEANEVAL is an activity of ACL-SIGWAC, the Association for Computational

> Linguistics (ACL) Special Interest Group on Web as Corpus.

>

>

>

>







More information about the Corpora-archive mailing list