[Corpora-List] Extended deadline for WAC5, Web as Corpus Workshop, San Sebastian, Spain, September 7, 2009

Serge Sharoff s.sharoff
Wed Apr 15 15:22:40 CEST 2009

With apologies for cross-posting.

We are extending the deadline for submissions to our WAC5 workshop to 24 April 2009 taking into account the Easter break.

The submissions are managed via Easychair.org. Please note that they will be migrating to a new server during the coming weekend (18-19 April) with a possible (shortish) downtime during this period.


> Call for Papers
> We invite papers on various topics concerning the use of Web resources
> for corpus research and NLP applications, including (but not limited to)
> the following:
> * linguistic Web crawler technology and Web corpus collection
> projects
> * applications of Web-derived corpora and other kinds of Web data
> * how far does the ?easy way? get you? (using search engines, or
> Google's n-gram lists; we are particularly interested in a
> critical discussion of the usefulness and limitations of such
> approaches)
> * methods and tools for ?cleaning? Web pages to turn them into a
> corpus
> * automatic linguistic annotation of Web data: tokenisation, POS
> tagging, lemmatisation, semantic tagging, etc. (established
> tools often perform very poorly on Web data)
> * search engine architectures for linguists: bringing linguistics
> to commercial search engines, or high-performance search
> technology to linguistics?
> * search engine-related topics such as result ranking (e.g. how to
> identify ?typical? uses rather than returning 50 very similar
> matches on the first page)
> * duplicate detection, interactive query refinement, etc.
> * reviews and clever uses of search engine APIs (Google, Yahoo,
> Altavista, and in particular Microsoft's current generous Live
> Search API)
> The workshop will be held on 7 September, 2009, in San Sebastian,
> preceding SEPLN, the Spanish NLP conference:
> http://ixa2.si.ehu.es/sepln2009/
> We particularly welcome submissions on the use of languages other than
> English. One of the bottlenecks in corpus linguistic research on a
> particular language consists in availability of corpora for this
> language: translation studies for, say, Ukrainian or Vietnamese are
> limited by the existence of diverse corpora for these languages. The Web
> gives the opportunity to alleviate this bottleneck, as millions of
> Ukrainian or Vietnamese texts are available on the Web, but we still do
> not know many parameters of what is there and how useful it is for
> translation, language teaching, linguistics research, etc.
> Submission information
> Authors are invited to submit full papers on original, unpublished work
> in the topic area of this workshop. Submissions should follow the format
> of ACL proceedings and should not exceed eight (8) pages, including
> references. We strongly recommend the use of ACL LaTeX or Microsoft Word
> style files tailored for this year's conference
> ( http://www.acl-ijcnlp-2009.org/main/authors/stylefiles/ ).
> Submissions are managed via Easy Chair. In order to submit a paper,
> login at http://www.easychair.org/conferences/?conf=wac5 (or register an
> account with Easy Chair if you don't have one yet), then click New
> Submission and fill in the standard fields.
> More information about the workshop will be available from our ACL
> SIGWAC webpage:http://www.sigwac.org.uk/

More information about the Corpora mailing list