[Corpora-List] Cross-document coreference/Entity Resolution: $50,000 Spock Challenge
eric at comp.leeds.ac.uk
Thu Apr 19 10:02:00 CEST 2007
Thanks for telling us what is in the download file, without having to
download it! - 97000 files (9Gb) of raw HTML, which contestants first
have to "clean" themselves before they can try any fancy NLP stuff.
A group of European reseachers from Trento and Leeds have launched
CLEANEVAL, another contest to build tidy tools for web-as-corpus
research, see http://cleaneval.sigwac.org.uk/ - This could be a useful
first-step for anyone trying the spock challenge; also, any spock
contestants could also enter their tidy-tool in the CLEANEVAL contest!
Eric Atwell, Leeds University
> From: owner-corpora at lists.uib.no [mailto:owner-corpora at lists.uib.no] On
> Behalf Of Alexandre Rafalovitch
> Sent: Wednesday, April 18, 2007 10:07 PM
> To: CORPORA at uib.no
> Subject: Re: [Corpora-List] Cross-document coreference/Entity Resolution:
> $50,000 Spock Challenge
> The website is rather sparse on information at the moment, so I have
> downloaded their (rather large) corpora and had a look.
> If anyone is interested in the challenge, my overview might help you
> to make a decision better and faster:
> Hope it helps,
More information about the Corpora-archive