[Corpora-List] Second Call for Papers: Web as Corpus at EACL 2006

Marco Baroni baroni at sslmit.unibo.it
Thu Dec 8 21:48:01 CET 2005

Apologies for cross- and re- posting...


Call for Papers:

In conjunction with the 11th Conference of the European Chapter of the
Association for Computational Linguistics (EACL)

Trento, Italy
April 4, 2006

Workshop site:


Submission form:


Previous WaC Workshop:


Co-chairs: Adam Kilgarriff and Marco Baroni


Despite the fact that a growing body of work has shown that the World
Wide Web is a mine of language data of unprecedented richness and ease
of access (see, e.g., the papers collected in Kilgarriff and
Grefenstette, 2003), many fundamental issues about the viability and
exploitation of the Web as a linguistic corpus are just starting to be
tackled, ranging from Web frequency distributions and registers, to
efficient handling of massive data sets, to copyright. Research on the
Web as corpus is currently at a very exciting stage: increasing
evidence points to the enormous potential of the Internet as a source
of linguistic data, but we are still far from a working, fully-fledged
linguists' search engine.

We invite submissions which:

- describe Web corpus collection projects, or modules for one part of
the process (crawling, filtering, language-id, tokenizing,
lemmatizing, POS-tagging, indexing, ...)

- explore characteristics of Web data, from a linguistics/NLP

- use crawled Web data for NLP purposes.

Preference will be given to projects where Web data are downloaded and
processed directly, rather than being accessed via search engine counts.

Submission Information

Authors are invited to submit full papers on original, unpublished
work in the topic area of this workshop. Submissions should follow the
two-column format of ACL proceedings and should not exceed eight (8)
pages, including references. We strongly recommend the use of ACL
LaTeX or Microsoft Word style files tailored for this year's
conference available at


Papers must conform to the official EACL-06 style guidelines, and we
reserve the right to reject submissions that do not conform to these
styles, including font size restrictions. Submissions should be in PDF
format and must include all fonts, so that the paper will print (not
just view) anywhere.

Please submit your paper no later than January 6, 2006, using the online
submission form available at


Each submission will be reviewed at least by two members of the
program committee. Accepted papers will be published in the workshop

Dual submissions to the main EACL 2006 conference and this workshop
are allowed; if you submit to the main session, do indicate this when
you submit to the workshop, and specify your EACL submission reference
number, for administrative ease. If your paper is accepted for the
main session, you should withdraw your paper from the workshop upon
notification by the main session.

Important Dates

January 6, 2006 - Deadline for workshop papers

January 27, 2006 - Notification of acceptance

February 10, 2006 - Camera-ready papers due

April 4, 2006 - Workshop

As the schedule is extremely tight, deadline extensions are NOT possible.

Program Committee

Marco Baroni (co-chair)
Silvia Bernardini
Massimiliano Ciaramita
Stefan Evert
William H. Fletcher
Gregory Grefenstette
Frank Keller
Adam Kilgarriff (co-chair)
Mirella Lapata
Anke Lüdeling
Philip Resnik
Serge Sharoff


Adam Kilgarriff: adam at lexmastersclass.com

Marco Baroni: baroni at sslmit.unibo.it

Further Information

Information on registration and registration fees will be provided at
the main conference site:


The EACL 2006 Workshops site:


Notice in particular the related workshop on New Text: Wikis and blogs
and other dynamic text sources:


More information about the Corpora-archive mailing list