This workshop will bring together leaders in information retrieval and language modeling to discuss the challenges in information retrieval and how language modeling approaches may help address some of these challenges. We will focus on the use of n-gram models to further research in areas such as document representation and content analysis, query analysis, retrieval models and ranking, and spelling, as well as the access to n-grams as an enabler of experimental design. Workshop Aims The aim of the workshop is to bring together a group of leaders in information retrieval and language modeling to discuss the challenges in information retrieval and how language modeling approaches may help address some of these challenges. At the workshop, we will focus on the use of n-gram models to further research in areas such as document representation and content analysis (e.g., clustering, classification, information extraction), query analysis (e.g., query suggestion, query reformulation), retrieval models and ranking, and spelling as well as the access to n-grams as an enabler of experimental design. Often discussed in the research community is the lack of large-scale dataset and benchmarks to run experiments. This workshop will address this issue by bringing together the community of researchers who use n-grams, already made available by Yahoo and Google/LDC along with a new Web N-gram service through which Microsoft Research, in partnership with Microsoft Bing, is providing the research community access to petabytes of Web N-gram via a cloud-based platform. The Web N-gram services directly address the data need by enabling the community of researchers to create data benchmarks for repeatable experiments, and by enabling the research community to be at the forefront of inventions based on real-world, large-scale data. The Microsoft Web N-gram services, currently in Beta<http://research.microsoft.com/web-ngram>, will be made available to participants upon request. Previous efforts of delivering n-grams to the research community adopted a data release approach with a cut off on the n-gram counts that obfuscate the long tail effects, an issue this service-based approach makes possible for further studies. Moreover, previous efforts also focused on just the document body; whereas richer types of textual contents are included in the Web N-gram service that can engage researchers in new innovations. Another notable difference is the scale: the Web N-gram service provides access to petabytes of data via services-up to two orders of magnitude greater than currently available offerings. Finally, by providing regular data refresh, the Web N-gram service can open up new research directions in fields where lack of dynamic data has locked academic researchers into conducting research over static and stale data sets. Topics We are now requesting paper submissions for the Web N-gram Workshop. We encourage researchers to use the Microsoft Web N-gram services to explore novel applications of language models (e.g., long tail effects) and use of these data for enhancing the search experience (e.g., use of anchor text as a proxy to queries). We will also consider other applications such as machine translation and speech. If you would like to use the Microsoft Web N-gram services in preparation of your paper, send an e-mail message to webngram at microsoft.com<mailto:webngram at microsoft.com> to request access. We also encourage research and experiments using or comparing different n-grams data sets to ultimately help create, at the workshop, a useful n-gram baseline for the research community, in terms of n-gram attributes such as size, access, content, and model types needed for researchers. For more information, see Submissions<http://research.microsoft.com/en-us/events/webngram/submissions.aspx>. Planned Activities As part of the workshop, experiment results will be presented via talks (average of 15 minutes per talk, plus 5 minutes of questions and answers) and with posters and/or demo sessions. In addition, there will be a panel discussion on providing access to data, with a focus on academia needs, challenges, and opportunities for industries to provide such data.
* Paper submissions due: June 11, 2010
* Notifications sent to authors: June 28, 2010
* Camera-ready papers due: July 9, 2010
* Full-day workshop: July 23, 2010
Organizing Committee * Chengxiang Zhai, University of Illinois at Urbana-Champaign * David Yarowsky, Johns Hopkins University * Evelyne Viegas, Microsoft Research * Kuansan Wang, Microsoft Research * Stephan Vogel, Carnegie Mellon University Programme Committee * Eytan Adar, University of Michigan * Eugene Agichtein, Emory University * Thorsten Brants, Google Research * Jaime Callan, Carnegie Mellon University * Kevin Chang, University of Illinois at Urbana Champaign * Ken Church, Johns Hopkins University * Charlie Clarke, University of Waterloo * Bruce Croft, University of Massachusetts Amherst * Nick Craswell, Microsoft * Brian Davison, Lehigh University * Bill Dolan, Microsoft Research * George Dupret, Yahoo! Research * Efthimis N. Efthimiadis, University of Washington * Michael Gamon, Microsoft Research * Alistair Moffat University of Melbourne * Emmanuel Prochasson, Hong Kong University of Science & Technology * Jian-Tao Sun, Microsoft Research Asia * Amanda Spink, Loughborough University * Jurgen Van Gael, University of Cambridge * Evelyne Viegas, Microsoft Research * Stefan Vogel, Carnegie Mellon University * Peng Xu, Google Research * David Yarowsky, John Hopkins University * Hongyuan Zha, Georgia Institute of Technology * Chengxiang Zhai, University of Illinois at Urbana-Champaign
-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 29938 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20100511/717c074c/attachment.txt>