[Corpora-List] 3rd WEB AS CORPUS WORKSHOP (WAC3): Call for papers

CÚdrick Fairon cedrick.fairon at uclouvain.be
Wed Mar 14 23:57:01 CET 2007


------------------------------------------------------------------------
CALL FOR PAPERS
------------------------------------------------------------------------

3rd WEB AS CORPUS WORKSHOP (WAC3)
incorporating CLEANEVAL

An ACL-SIGWAC event*

------------------------------------------------------------------------
Sept. 15-16, 2007
University of Louvain, Louvain-la-Neuve, Belgium

<http://cental.fltr.ucl.ac.be/wac3>
------------------------------------------------------------------------

More and more people are using Web data for linguistic and NLP research.
The workshop provides a venue for exploring how we can use it
effectively and what we will find if we do.

We invite submissions which :

* describe Web corpus collection projects, or modules for one part
of the process (crawling, filtering, language-id, tokenising,
lemmatising, POS-tagging, indexing, ...

* explore characteristics of Web data, from a linguistics/NLP
perspective including registers, domains, frequency distributions

* use crawled Web data for NLP purposes (with emphasis on the data
rather than the use)


-- Cleaneval --

Anyone using web data needs to clean it, to get rid of unwanted material
including, for example, HTML markup, navigation bars, advertisements.
To date there has been no sharing of resources or expertise and the
cleaning has often been done minimally. Cleaneval is an exercise to
promote sharing and to improve our understanding of the issues. It will
take the now-familiar form of an open competition and shared task. More
info at Cleaneval <http://cleaneval.sigwac.org.uk>.


-- Invited speaker : Kevin Scannell --

Kevin Scannell, of Saint Louis Univ., Missouri, USA, has been working
with scholars of a range of smaller languages to develop web corpora for
those languages : website <http://borel.slu.edu/crubadan/stadas.html>
currently lists 135 corpora/languages.


-- Previous WAC workshops --

WAC1 at Corpus Linguistics conference, Birmingham, UK, July 2005:
<http://sslmit.unibo.it/~baroni/web_as_corpus_cl05.html>.

WAC2 at EACL, Trento, Italy, April 2006:
<http://sslmit.unibo.it/~baroni/web_as_corpus_eacl06.html>.


-- Submission --

For regular papers: Papers (6-10 pages), demos (max. 2 pages) and
posters (max. 2 pages) to be written in English.

Template files (.doc & Latex) available on the WAC3 website.
Proceedings will be published in "Cahiers du Cental" at the
Louvain University Press: http://cental.fltr.ucl.ac.be/cahiers

For CLEANEVAL submissions see Cleaneval website:
<http://cleaneval.sigwac.org.uk>.

Deadline: 1 May 2007


-- Venue --

UniversitÚ catholique de Louvain <http://www.uclouvain.be/en-
index.html>,
in the elegant new city of Louvain-la-Neuve
<http://www.eupedia.com/belgium/louvain-la-neuve.shtml> (Belgium).
Large computer rooms will be available for demo sessions.


-- Points of contact --

Worskshop Co-chairs

CÚdrick Fairon, UCLouvain, Cental, fairon at tedm.ucl.ac.be
Prof. Gilles-Maurice de Schryver, Universiteit Gent

Cleaneval committee

Marco Baroni, U Trento; Secretary, SIGWAC
Tony Hartley, U Leeds
Adam Kilgarriff, Lexical Computing Ltd; Chair, SIGWAC
Serge Sharoff, U Leeds


Local organisation team

Bernadette Dehottay, UCLouvain, Cental, dehottay at tedm.ucl.ac.be
Julia Medori, CENTAL, UCLouvain
Laurent Kevers, CENTAL, UCLouvain
Hubert Naets, CENTAL, UCLouvain
Isabelle Lecroart, CENTAL, UCLouvain
Claude Devis, CENTAL, UCLouvain


Contact us :
Bernadette Dehottay
UniversitÚ catholique de Louvain
Centre for Natural Language Processing (CENTAL)
Place Blaise Pascal, 1
1348 Louvain-la-Neuve
Tel. +32 10 47 37 88
Fax. +32 10 47 26 06
dehottay at tedm.ucl.ac.be




CÚdrick Fairon
cedrick.fairon at uclouvain.be

Directeur du CENTAL
Centre de traitement automatique du langage
UniversitÚ catholique de Louvain
Place Blaise Pascal, 1
1348 Louvain-la-Neuve
Belgique
tel: +32 10 47 37 88
fax: +32 10 47 26 06

http://cental.fltr.ucl.ac.be
http://glossa.fltr.ucl.ac.be







More information about the Corpora-archive mailing list