[Corpora-List] The WebNLG Challenge: Generating Text from RDF Data

Claire Gardent claire.gardent at loria.fr
Thu Apr 27 11:39:14 CEST 2017

*** Apologies for Multiple Postings ***

=================================================== The WebNLG Challenge: First Call for Participation http://talc1.loria.fr/webnlg/stories/challenge.html ===================================================

* TASK The task is to map RDF data to text. For instance, given the 3 RDF triples shown in (a), the aim is to generate a text such as (b).

a. (John_E_Blaha birthDate 1942_08_26)

(John_E_Blaha birthPlace San_Antonio)

(John_E_Blaha occupation Fighter_pilot) b. John E Blaha, born in San Antonio on 1942-08-26, worked as a fighter pilot

* IMPORTANT DATES - 11 April 2017: Release of Training and Development Data - 30 April 2017: Release of Baseline System - 18 August 2017: Release of Test Data - 25 August 2017: Entry submission deadline - 5 September 2017: Results of automatic evaluation and system presentations at INLG 2017 - 30 September 2017 : Results of human evaluation

* Data The WebNLG dataset consists of 21,855 (data, text) pairs with a total of 8,372 distinct input units describing entities belonging to 9 distinct DBpedia categories (Astronaut, University, Monument, Building, ComicsCharacter, Food, Airport, SportsTeam and WrittenWork). To download the data, please go to the WebNLG Challenge website:


and register using the web form (Data Section).

* MOTIVATION The WebNLG data was created to promote the development (i) of RDF verbalisers and (ii) of microplanners able to handle a wide range of linguistic constructions.

[RDF Verbalisers.] The RDF language in which DBPedia is encoded is widely used within the Linked Data framework. Many large scale datasets are encoded in this language (e.g., MusicBrainz, FOAF, LinkedGeoData) and official institutions increasingly publish their data in this format. Being able to generate good quality text from RDF data would open the way to many new applications such as making linked data more accessible to lay users, enriching existing text with information drawn from knowledge bases or describing, comparing and relating entities present in these knowledge bases.

[Generating linguistically rich text.] While many recent datasets for generation takes as input dialogue act which can be viewed as trees of depth one, the WebNLG data was carefully constructed to allow for input trees of various shapes and depth and thereby allow for greater syntactic diversity in the corresponding text [1]. We hope that the WebNLG challenge will drive the NLG and deep learning community to take up this new challenge and work on the development of generators that can handle the generation of linguistically rich texts.

* References Creating Training Corpora for Micro-Planners. C. Gardent, A. Shimorina, S. Narayan and L. Perez-Beltrachini. Proceedings of ACL 2017. Vancouver (Canada).

Building RDF Content for Data-to-Text Generation. L. Perez-Beltrachini, R. Sayed and C. Gardent. Proceedings COLING 2016. Osaka (Japan).

The WebNLG Challenge: Generating Text from DBPedia Data. E. Colin, C. Gardent, Y. Mrabet, S. Narayan and L. Perez-Beltrachini. Proceedings of INLG 2016. Edinburgh (Scotland).

* ORGANISING COMMITTEE - Claire Gardent, CNRS/LORIA, Nancy, France - Anastasia Shimorina, CNRS/LORIA, Nancy, France - Shashi Narayan, School of Informatics, University of Edinburgh, UK - Laura Perez-Beltrachini, School of Informatics, University of Edinburgh, UK


webnlg2017 at inria.fr

More information about the Corpora mailing list