[Corpora-List] Internet chat corpora research

Davis, Boyd bdavis at uncc.edu
Thu Feb 5 15:31:25 CET 2009

You will find several articles on chat in Kelsey, Sigrid and Kirk St.Amant, eds. Handbook of Research on Computer Mediated Communication. IGI Global, 2008, including one by me and Peyton Mason on textchat, which is corpus-based. Hope this helps

Boyd Davis

Message: 1 Date: Tue, 3 Feb 2009 19:30:35 +0100 From: Manuela Speranza <manspera at fbk.eu> Subject: [Corpora-List] EVALITA 2009 First Call for Participation To: corpora at uib.no

<apologies for cross-posting>



Evaluation of NLP and Speech Tools for Italian



EVALITA 2009 - First Call for Participation February 2009

We are pleased to announce that the registration to EVALITA 2009, the second evaluation campaign of Natural Language Processing and Speech tools for Italian, is now open. The general objective of Evalita is to promote the development of language technologies for the Italian language, providing a shared framework where different systems and approaches can be evaluated in a consistent manner.

We invite participation, both from academic institutions and industrial organizations, in eight tasks, all for Italian. With respect to the previous edition, tasks also include a range of Speech tasks and a new Textual Entailment task.

Text tasks - PoS-Tagging - Parsing

- Dependency Parsing

- Constituency Parsing - Lexical substitution - Entity Recognition

- Named Entity Recognition

- Local Entity Detection and Recognition - Textual Entailment

Speech tasks - Connected Digits Recognition

- Clean Digits

- Noisy Digits - Dialogue System Evaluation - Speaker Identity Verification

- Application

- Forensic

As with the previous edition, guidelines describing the different tasks will be distributed to participants. Participants will also be provided with training data and will have the chance to test their systems with the evaluation metrics and procedures to be used in the formal evaluation. The results of the evaluation will be disseminated at the final workshop, which will be organized in conjunction with AI*IA 2009 and will take place in ReggioEmilia (to be confirmed).

In order to register for one or more of the EVALITA 2009 tasks, please go to the registration page of the EVALITA 2009 website: http://evalita.fbk.eu/registration.html

Please notice that some weeks ago there has been a change in the important dates. Updated information is always available on the website at http://evalita.fbk.eu


- 1st February 2009: on-line registration opens - 1st April 2009: development data available to participants - 10th September 2009: test data available, registration closes - 20th September 2009: system results due to organizers - 5th October 2009: assessment returned to participants - 25th October 2009: technical reports due to organizers - 12th December 2009: final workshop, Reggio Emilia (to be confirmed)

Coordination: Bernardo Magnini (FBK-irst) and Amedeo Cappelli (ISTI-CNR and CELCT)

Task Organization: - PoS-Tagging: Giuseppe Attardi and Maria Simi (Uni. Pisa) - Parsing: Cristina Bosco, Alessandro Mazzei, Vincenzo Lombardo (Uni. Torino), Felice dell'Orletta (Uni. Pisa), Alessandro Lenci (Uni. Pisa) and Simonetta Montemagni (ILC-CNR, Pisa) - Lexical Substitution: Antonio Toral (ILC-CNR, Pisa) - Entity Recognition: Manuela Speranza (FBK-irst, Trento), Valentina Bartalesi Lenzi, Rachele Sprugnoli (CELCT, Trento) - Textual Entailment: Johan Bos (Uni. Roma "La Sapienza"), Marco Pennacchiotti (Saarland University), Fabio Massimo Zanzotto (Uni. Roma "Tor Vergata") - Connected Digits Recognition: Gianpaolo Coro (ABLA, Milano), Roberto Gretter (FBK-irst, Trento) and Marco Matassoni (FBK-irst, Trento) - Spoken Dialogue Evaluation: Giuseppe Riccardi (Uni. Trento), Francesco Cutugno (Uni. Napoli "Federico II") and Roberto Pieraccini (Speechcycle, New York) - Speaker Identity Verification: Guido Aversano (Parrot SA, Paris) and Luciano Romito (Uni. Calabria, Cosenza)

Contact: Manuela Speranza - manspera at fbk.eu


Message: 2 Date: Wed, 04 Feb 2009 14:40:14 +0000 From: Matthew Purver <mpurver at dcs.qmul.ac.uk> Subject: [Corpora-List] SIGDIAL 2009: Preliminary Call for Papers To: CORPORA list <corpora at hd.uib.no>


10th Annual Meeting of the Special Interest Group

on Discourse and Dialogue

Queen Mary University of London, UK September 11-12, 2009

(right after Interspeech 2009)

Submission Deadline: April 24, 2009


The SIGDIAL venue provides a regular forum for the presentation of cutting edge research in discourse and dialogue to both academic and industry researchers. Due to the success of the nine previous SIGDIAL workshops, SIGDIAL is now a conference. The conference is sponsored by the SIGDIAL organization, which serves as the Special Interest Group in discourse and dialogue for both ACL and ISCA. SIGDIAL 2009 will be co-located with Interspeech 2009 as a satellite event.

In addition to presentations and system demonstrations, the program includes an invited talk by Professor Janet Bavelas of the University of Victoria, entitled "What's unique about dialogue?".


We welcome formal, corpus-based, implementation, experimental, or analytical work on discourse and dialogue including, but not restricted to, the following themes:

1. Discourse Processing and Dialogue Systems

Discourse semantic and pragmatic issues in NLP applications such as text summarization, question answering, information retrieval including topics like:

- Discourse structure, temporal structure, information structure ; - Discourse markers, cues and particles and their use; - (Co-)Reference and anaphora resolution, metonymy and bridging resolution; - Subjectivity, opinions and semantic orientation;

Spoken, multi-modal, and text/web based dialogue systems including topics such as:

- Dialogue management models; - Speech and gesture, text and graphics integration; - Strategies for preventing, detecting or handling miscommunication (repair and correction types, clarification and under-specificity, grounding and feedback strategies); - Utilizing prosodic information for understanding and for disambiguation;

2. Corpora, Tools and Methodology

Corpus-based and experimental work on discourse and spoken, text-based and multi-modal dialogue including its support, in particular:

- Annotation tools and coding schemes; - Data resources for discourse and dialogue studies; - Corpus-based techniques and analysis (including machine learning); - Evaluation of systems and components, including methodology, metrics and case studies;

3. Pragmatic and/or Semantic Modeling

The pragmatics and/or semantics of discourse and dialogue (i.e. beyond a single sentence) including the following issues:

- The semantics/pragmatics of dialogue acts (including those which are less studied in the semantics/pragmatics framework); - Models of discourse/dialogue structure and their relation to referential and relational structure; - Prosody in discourse and dialogue; - Models of presupposition and accommodation; operational models of

conversational implicature.


The program committee welcomes the submission of long papers for full plenary presentation as well as short papers and demonstrations. Short papers and demo descriptions will be featured in short plenary presentations, followed by posters and demonstrations.

- Long papers must be no longer than 8 pages, including title, examples, references, etc. In addition to this, two additional pages are allowed as an appendix which may include extended example discourses or dialogues, algorithms, graphical representations, etc. - Short papers and demo descriptions should be 4 pages or less (including title, examples, references, etc.).

Please use the official ACL style files: http://ufal.mff.cuni.cz/acl2007/styles/

Papers that have been or will be submitted to other meetings or publications must provide this information (see submission format). SIGDIAL 2009 cannot accept for publication or presentation work that will be (or has been) published elsewhere. Any questions regarding submissions can be sent to the General Co-Chairs.

Authors are encouraged to make illustrative materials available, on the web or otherwise. Examples might include excerpts of recorded conversations, recordings of human-computer dialogues, interfaces to working systems, and so on.


In order to recognize significant advancements in dialog and discourse science and technology, SIGDIAL will (for the first time) recognize a BEST PAPER AWARD and a BEST STUDENT PAPER AWARD. A selection committee consisting of prominent researchers in the fields of interest will select the recipients of the awards.


Submission: April 24, 2009 Workshop: September 11-12, 2009


SIGDIAL 2009 conference website: http://www.sigdial.org/workshops/workshop10/ SIGDIAL organization website: http://www.sigdial.org/ Interspeech 2009 website: http://www.interspeech2009.org/


For any questions, please contact the appropriate members of the organizing committee:

GENERAL CO-CHAIRS Pat Healey (Queen Mary University of London): ph at dcs.qmul.ac.uk Roberto Pieraccini (SpeechCycle): roberto at speechcycle.com

TECHNICAL PROGRAM CO-CHAIRS Donna Byron (Northeastern University): dbyron at ccs.neu.edu Steve Young (University of Cambridge): sjy at eng.cam.ac.uk

LOCAL CHAIR Matt Purver (Queen Mary University of London): mpurver at dcs.qmul.ac.uk

SIGDIAL PRESIDENT Tim Paek (Microsoft Research): timpaek at microsoft.com

SIGDIAL VICE PRESIDENT Amanda Stent (AT&T Labs - Research): amanda.stent at gmail.com


Gregory Aist Arizona State University, USA Jan Alexandersson DFKI GmbH, Germany Jason Baldridge University of Texas at Austin, USA Srinivas Bangalore AT&T Labs - Research, USA Dan Bohus Microsoft Research, USA Johan Bos Università di Roma "La Sapienza", Italy Charles Calloway University of Edinburgh, UK Rolf Carlson Royal Institute of Technology (KTH), Sweden Mark Core University of Southern California, USA David DeVault University of Southern California, USA Myroslava Dzikovska University of Edinburgh, UK Markus Egg Rijksuniversiteit Groningen, Netherlands Stephanie Elzer Millersville University, USA Mary Ellen Foster Technical University Munich, Germany Kallirroi Georgila University of Edinburgh, UK Jonathan Ginzburg King's College London, UK Genevieve Gorrell Sheffield University, UK Alexander Gruenstein Massachusetts Institute of Technology, USA Pat Healey Queen Mary University of London, UK Mattias Heldner Royal Institute of Technology (KTH), Sweden Beth Ann Hockey University of California at Santa Cruz, USA Kristiina Jokinen University of Helsinki, Finland Arne Jonsson University of Linköping, Sweden Simon Keizer University of Cambridge, UK John Kelleher Dublin Institute of Technology, Ireland Alexander Koller University of Edinburgh, UK Ivana Kruijff-Korbayová Universität des Saarlandes, Germany Staffan Larsson Göteborg University, Sweden Gary Geunbae Lee Pohang University of Science and Technology, Korea Fabrice Lefevre University of Avignon, France Oliver Lemon University of Edinburgh, UK James Lester North Carolina State University, USA Diane Litman University of Pittsburgh, USA Ramón López-Cózar University of Granada, Spain François Mairesse University of Cambridge, UK Michael McTear University of Ulster, UK Wolfgang Minker University of Ulm, Germany Sebastian Möller Deutsche Telekom Labs and Technical University Berlin, Germany Vincent Ng University of Texas at Dallas, USA Tim Paek Microsoft Research, USA Patrick Paroubek LIMSI-CNRS, France Roberto Pieraccini SpeechCycle, USA Paul Piwek Open University, UK Rashmi Prasad University of Pennsylvania, USA Matt Purver Queen Mary University of London, UK Laurent Romary INRIA, France Alex Rudnicky Carnegie Mellon University, USA Yoshinori Sagisaka Waseda University, Japan Ruhi Sarikaya IBM Research, USA Candy Sidner BAE Systems AIT, USA Ronnie Smith East Carolina University, USA Amanda Stent AT&T Labs - Research, USA Matthew Stone Rutgers University, USA Matthew Stuttle Toshiba Research, UK Joel Tetreault Educational Testing Service, USA Jason Williams AT&T Labs - Research, USA

-- Matthew Purver - http://www.dcs.qmul.ac.uk/~mpurver/

Senior Research Fellow Interaction, Media and Communication Department of Computer Science Queen Mary University of London, London E1 4NS, UK


Message: 3 Date: Wed, 04 Feb 2009 12:20:03 +0100 From: Serge Rosmorduc <s.rosmorduc at iut.univ-paris8.fr> Subject: [Corpora-List] Second call for paper: "Natural Language

Processing for Ancient Languages" To: corpora at uib.no

Second call for paper: "Natural Language Processing for Ancient Languages"

Guest editors: Joseph Denooz and Serge Rosmorduc

The TAL journal launches a call for papers for an special issue of the journal on NLP for Ancient Languages. « Ancient Languages » is here understood on a large basis: we consider both dead languages (Akkadian, Ancient Egyptian, Latin...) and old stages of modern languages (Old and Middle French...).

Proposals may deal with all aspect of computer storage and processing of ancient languages, as for instance: automated morphological or syntactic analysis of ancient languages; text corpora (creation of a corpus, searches systems, etc.); dictionaries; encoding of ancient languages (scripting system, text representation, unicode and ISO 10646, etc.); XML, TEI and ancient languages (use of XML to model ancient documents, corpus representation, DTD or schemas for dictionaries) ; text capture, OCR and ancient texts, links between pictures corpora and structured representations; NLP as a tool for the philologist (practical uses of NLP in the context of philological or grammatical research). uses of NLP in teaching ancient languages. diachronic studies (models for language change, diachronic databases, etc.)

The Journal TAL (Traitement Automatique des Langues / Natural Language Processing) is a forty year old international journal published by ATALA (French Association for Natural Language Processing) with the support of CNRS (National Centre for Scientific Research). It has moved to an electronic mode of publication, with printing on demand - see http://atala.org/-Revue-TAL-. This affects in no way its reviewing and selection process. Practical issues Contributions (25 pages maximum, PDF format) must be sent by e-mail to the following address: (rosmord _at_ iut dot univ-paris8 dot fr) Style sheets are available for download on the Web site of the journal: http://atala.org/English-style-files. Language: manuscripts may be submitted in English or French. French-speaking authors are requested to submit in French. Important dates 27/02/2009 Deadline for submission. 31/04/2009 Notification to authors. 03/06/2009 Deadline for submission of a revised version. 03/07/2009 Final decision. October 2009 Parution

Invited editorial board François Barthélémy, CEDRIC, Conservatoire National des Arts et Métiers, France Mahé Ben Hamed, Laboratoire Dynamique du langage, CNRS - Université Lumière Lyon 2 Francesco Citti, Université de Bologne, Joseph Denooz, LASLA, Université de Liège, Belgique Gérard Huet, Équipe Sanscrit, INRIA, France Wojciech Jaworski, Institute of Informatics, Warsaw University, Bastien Kindt, Institut orientaliste, Université catholique de Louvain, Belgique George Kiraz, Gorgias Press, USA Christiane Marchello-Nizia, ICAR, ENS Lyon Nicolas Mazziotta, , Université de Liège, Belgique Sylvie Mellet, BCL, CNRS, Remo Mugnaioni, Centre Sciences du Langage, EA 85, Université de Provence Mark-Jan Nederhof, University of St Andrews Mark Olsen, ARTFL, Université de Chicago Gerald Penn, Department of Computer Sciences, University of Toronto, Canada Serge Rosmorduc, Équipe Langues et littératures de l'Égypte ancienne, EPHE Ivième section, Wolfgang Schenkel, , Université de Tügingen, RFA Richard Sproat, University of Illinois, USA, Achim Stein , Institut für Linguistik/Romanistik, Universität Stuttgart Paul Tombeur, CTLO (Turnhout), Belgique Laurence Tuerlinckx, Institut orientaliste, Université catholique de Louvain, Belgique Jerzy Tyszkiewicz, Institute of Informatics, Warsaw University, Jean Winand, Service d'Égyptologie, Université de Liège, Belgique

This call for paper is available in various formats on http://www.iut.univ-paris8.fr/~rosmord/TAL/


Message: 4 Date: Tue, 3 Feb 2009 22:14:01 +0100 (CET) From: Paul Buitelaar <Paul.Buitelaar at dfki.de> Subject: [Corpora-List] 2 PhD positions in Natural Language Processing Cc: paul.buitelaar at deri.org

2 PhD positions in Natural Language Processing DERI ? National University of Ireland, Galway   The newly established Unit for Natural Language Processing at the Digital Enterprise Research Institute (DERI: http://www.deri.ie/) of the National University of Ireland, Galway invites applications for two PhD positions.   DERI is a leading research institute in semantic technologies that offers a stimulating, dynamic and multi-cultural research environment, excellent ties to research-groups worldwide, close collaboration with industrial partners and up-to-date infrastructure and resources.   The DERI Unit for Natural Language Processing (http://nlp.deri.ie/) has a focus on applied research in ontology-based information extraction, semantic-level text mining and the use of linguistic and semantic methods in information retrieval. The unit develops methods for the efficient application of NLP tools in combination with domain semantics as specified in ontologies, thesauri and other knowledge organisation systems for relevant use cases. Research is carried out in close cooperation with the DERI Unit for Information Mining and Retrieval in the context of the DERI Semantic Information Mining stream (SIM: http://sim.deri.ie/) as well as with other DERI units.   Candidates should have an excellent university degree in a relevant field of study, e.g. computer science, computational linguistics, information science, etc. with an emphasis on natural language processing. Selected candidates are expected to have the willingness to combine formal scientific work with application-oriented research and development in projects funded by national and international (EU) funding agencies   Please send your application (CV, two letters of reference) in PDF format to the email address below ? by February 16th, 2009   Further details can be obtained from:   Dr. Paul Buitelaar Unit for Natural Language Processing DERI - National University of Ireland, Galway IDA Business Park, Lower Dangan, Galway, Ireland   paul dot buitelaar at deri dot org


Message: 5 Date: Thu, 5 Feb 2009 13:14:31 +0900 From: Sebastian Riedel <sebastian.riedel at gmail.com> Subject: [Corpora-List] CfP: Workshop on Integer Linear Programming

for NLP at NAACL HLT 2009 To: corpora at hd.uib.no

========================================================== NAACL HLT 2009 Workshop on Integer Linear Programming for Natural Language Processing

June 4, 2009, Boulder, Colorado, USA http://www-tsujii.is.s.u-tokyo.ac.jp/ilpnlp/

Call for Papers (Submission deadline: March 6, 2009) ========================================================== Integer Linear Programming (ILP) has recently attracted much attention within the NLP community. Formulating problems using ILP has several advantages. It allows us to focus on the modelling of problems, rather than engineering new search algorithms; provides the opportunity to incorporate generic global constraints; and guarantees exact inference. This and the availability of off-the-shelf solvers has lead to a large variety of natural language processing tasks being formulated in the ILP framework, including semantic role labelling, syntactic parsing, summarisation and joint information extraction.

The use of ILP brings many benefits and opportunities but there are still challenges for the community; these include: formulations of new applications, dealing with large-scale problems and understanding the interaction between learning and inference at training and decision time. The purpose of this workshop is to bring together researchers interested in exploiting ILP for NLP applications and tackling the issues involved. We are interested in a broad range of topics including, but not limited to:

- Novel ILP formulations of NLP tasks. This includes: the introduction of ILP formulations of tasks yet to be tackled within the framework; and novel formulations, such as equivalent LP relaxations, that are more efficient to process than previous formulations.

- Learning and Inference. This includes issues relating to: decoupling of learning (e.g., learning through local classifiers) and inference, learning with exact (e.g., ILP) or approximate inference, learning of constraints, learning weights for soft constraints, and the impact of ignoring various constraints during learning.

- The utility of global hard and soft constraints in NLP. Sometimes constraints do not increase accuracy (and can even decrease it), when and why do global constraints become useful? For example, do global constraints become more important if we have less data?

- Formulating and solving large NLP problems. Applying ILP to hard problems (such as parsing, machine translation and solving several NLP tasks at once) often results in very large formulations which can be impossible to solve directly by the ILP engine. This may require exploring different ILP solving methods (such as, approximate ILP solvers/methods) or cutting plane and pricing techniques.

- Alternative declarative approaches. A variety of other modeling frameworks exist, of which ILP is just one instance. Using other approaches, such as weighted MAX-SAT, Constraint Satisfaction Problems (CSP) or Markov Networks, could be more suitable than ILP in some cases. It can also be helpful to model a problem in one framework (e.g., Markov Networks) and solve them with another (e.g., ILP) by using general mappings between representations.

- First Order Modelling Languages. ILP, and other essentially propositional languages, require the creation of wrapper code to generate an ILP formulation for each problem instance. First (Higher) Order languages, such as Learning Based Java and Markov Logic, reduce this overhead and can also aid the solver to be more efficient. Moreover, with such languages the automatic exploration of the model space is easier.


We encourage submissions addressing the above questions and topics or other relevant issues. Authors are invited to submit a full paper of up to 8 pages (with up to 1 additional page for references), or an abstract of up to 2 pages. Appropriate topics for abstracts include preliminary results, application notes, descriptions of work in progress, etc. Previously published papers cannot be accepted.

The submissions will be reviewed by the program committee. Note that reviewing will be blind and hence no author information should be included in the papers. Self-references that reveal the author's identity, e.g., "We previously showed (Smith, 1991) ?", should be avoided. Instead, use citations such as "Smith previously showed (Smith, 1991) ?".

Papers will be accepted on or before 6 March 2009 in PDF format via the START system at https://www.softconf.com/naacl-hlt09/ILPNLP2009/. Submissions should follow the NAACL HLT 2009 formatting requirements for full papers , found at http://clear.colorado.edu/NAACLHLT2009/stylefiles.html.

IMPORTANT DATES: March 6, 2009: Submission deadline March 30, 2009: Notification of acceptance April 12, 2009: Camera-ready copies due June 4, 2009: Workshop held in conjunction with NAACL HLT

INVITED SPEAKER: Dan Roth (University of Illinois at Urbana-Champaign)

PROGRAM COMMITTEE: - Dan Roth (University of Illinois at Urbana-Champaign) - Mirella Lapata (University of Edinburgh) - Scott Yih (Microsoft Research) - Nick Rizzolo (University of Illinois at Urbana-Champaign) - Ming-Wei Chang (University of Illinois at Urbana-Champaign) - Ivan Meza-Ruiz (University of Edinburgh) - Ryan McDonald (Google Research) - Jenny Rose Finkel (Stanford University) - Pascal Denis (INRIA Paris-Rocquencourt) - Manfred Klenner (University of Zurich) - Hal Daume III (University of Utah) - Daniel Marcu (University of Southern California) - Kevin Knight (University of Southern California) - Katja Filippova (EML Research) - Mark Dras (Macquarie University) - Hiroya Takamura (Tokyo Institute of Technology)

ORGANIZERS AND CONTACT: - James Clarke (University of Illinois at Urbana-Champaign) - Sebastian Riedel (University of Tokyo)

Email: ilpnlp2009 at gmail.com Website: http://www-tsujii.is.s.u-tokyo.ac.jp/ilpnlp/


Message: 6 Date: Thu, 5 Feb 2009 14:02:04 +0100 From: Leszek Szyma?ski <l_sz at poczta.fm> Subject: [Corpora-List] Looking for Internet chat corpora research To: <corpora at uib.no>

Dear Sir/Madam, I am currently working on corpus-based research on Internet chat conversations. With reference to this, I am looking for information about research carried out on Internet chat communication with the use of corpus methodology. I am especially interested in English language corpora analyses; however, corpus-based research on other languages will also be appreciated.

