C A L L F O R P A R T I C I P A T I O N
French-German Colloquium WikiCorp 2018
Fostering linguistic studies on Wikipedia discussions.
Multilingual corpus building, annotation and exploration tools
Two-day colloquium at Université Nice Côte d'Azur (FR) July 9-10, 2018
David Laniado, Eurecat Barcelona Torsten Zesch, Universität Duisburg-Essen
WikiCorp 2018 Website: http://www1.ids-mannheim.de/kl/nice2018
If you would like to participate, please fill in the registration form at http://www1.ids-mannheim.de/fileadmin/kl/nizza/registration_form.pdf and send it until 2 July 2018 via email to Celine.Poudat at unice.fr
Céline Poudat (Université Nice Côte d'Azur) Angelika Storrer (Universität Mannheim) Harald Lüngen (Institut für Deutsche Sprache, Mannheim) Laura Herzberg (Universität Mannheim)
Local organisation: Céline Poudat, Daria Ciałoń, Magali Guaresi and BCL team in Nice.
Funding: Huma-Num CORLI consortium
Confirmed Participants (last updated 2018-05-02)
Natalia Grabar, STL, Université Lille 3 Laura Herzberg, Universität Mannheim Mai Ho-Dac, CLLE-ERSS, Université Toulouse Marc Kupietz, Institut für Deutsche Sprache, Mannheim David Laniado, Eurecat, Barcelona Harald Lüngen, Institut für Deutsche Sprache, Mannheim Christophe Parisse, Head of Ortolang, MoDyCO, Université Paris X-Nanterre Céline Poudat, BCL, Université Côte d’Azur Angelika Storrer, Universität Mannheim Serena Villata, Wimmics, Université Côte d’Azur Torsten Zesch, Universität Duisburg-Essen
Location: Campus Saint-Jean-d’Angely 3, MSHS building, Salle Plate.
PRELIMINARY SCHEDULE (last updated 2018-05-29)
Monday, 9 July 2018
10:00-12:00 Section I: Joint corpus building, standards, and tools
► Short presentations on Features of the French and
German Wikipedia corpora, From Wiki dump and wiki text
to TEI, EuReCo idea and state of affairs
► Discussion and documentation of desiderata and requirements
12:00 - 13:30 Lunch
13:30-16:00 Section II: Linguistic Analyses of social interaction and conflicts
Invited talk: Torsten Zesch: Annotating, Detecting, and
Understanding Stance in Computer-Mediated Debates
► Short presentations on annotation categories for
linguistic analysis of interaction patterns, conflict
analysis, conflict detection
► Discussion and documentation of desiderata and
16:30 - 17:30 Breakout Session - Breakout Session -
ad Section I(a) ad Section II
Tuesday, 10 July 2018
9:30-10:00 Documentation of the Results of the two Breakout Sessions
from Day 1 10:00-12:30 Section III: Corpus analysis methods
Invited talk: David Laniado:
Visualisation of Wikipedia Interactions (working title)
► Short Presentations on Exploring French and German
Wikipedia discussion corpora using Hyperbase/Textométrie
and KorAP, Visualisation of word usage histories using
► Discussion and documentation of desiderata and
12:30- 14:00 Lunch 14.00-15:00 Breakout Session - Breakout Session -
ad Section III ad Section I (b)
15:00-15:30 Documentation of the results of the two parallel Breakout
16:00-17:30 ► Planning the post-conference publication.
► Planning the implementation of results, follow-up
activities, projects, and further co-operation
► Wrap-up of the colloquium
Wikipedia is one of the most successful projects of the Web 2.0. Since its launch in 2001, thousands of contributors have built this huge knowledge resource, which is not only used as an online encyclopedia, but also as an object of research in many academic disciplines. It also constitutes a rich and unique resource for linguistic studies, first of all because of its multilinguality, and secondly because of its huge discussion spaces, in which the collaborative writing effort is negotiated. These so-called talk pages can be used as big corpus resources of Computer-Mediated Communication (CMC).
The French and German participants of the colloquium are part of an initiative which aims to foster linguistic studies on Wikipedia, providing recommendations for the building of Wikipedia standardized corpora, methods for their linguistic processing and exploration, and descriptors and annotations for the analysis of talk pages. The French-German team of proposers started co-operating in 2016 with a first workshop in Mannheim entitled “Wikipedia: Discourse and corpus linguistic perspectives”. Since then, the proposers and other participants have co-operated in various constellations on conferences, for joint publications and proposals. The group is now ready to prepare the ground for jointly building comparable French-German corpora to be used in cross-lingual, corpus-based analyses of Wikipedia discussions.
Up to now, most linguistic studies on Wikipedia are focused on the article pages, and do not go into a deep analysis of the linguistic features used in the discussion spaces. This may be due to three reasons: (i) Wikipedia is quite a complex object that linguists have difficulties to manipulate; (ii) Wikipedia interactions need specific descriptors and ad hoc annotations for analysis; and (iii) existing corpus technologies and exploration tools need to be adjusted to the specificities of CMC corpora in general and Wikipedia corpora in particular. More sophisticated tools and methods for the linguistic annotation and corpus exploration are needed to better exploit the huge and valuable corpus resources that can be constructed from Wikipedia discussions.
The colloquium will bring together researchers that have solid experience with preparing monolingual (French and German) corpora from Wikipedia, with their dissemination and providing corpus technology for their analysis, and with conducting linguistic research on social interaction in Wikipedia discussions with a particular interest on the analysis and detection of conflicts.
Goals of the colloquium The colloquium is committed to the long-term goal of building comparable French-German discussion corpora as a special type of big CMC corpora using TEI-compliant standards. These shall serve as a basis to further develop common tools and methods for the cross-lingual, corpus-based analyses of interaction, politeness and conflict.
Objectives concerning corpus building, standards, and tools: Harmonize the parameters of the so far separate French and German Wikipedia corpus building processes in order to make them interoperable for D-F contrastive and cross-lingual analyses: further develop the standards of the TEI CMC SIG; align metadata categories and value taxonomies.
Objectives concerning interaction analyses: Develop annotation categories for interaction patterns, politeness cues, and conflict analysis, joint representation of conflict structures.
Objectives concerning corpus analysis methods: Develop and adapt corpus-linguistic methods from KorAP and Textométrie to explore and visualize cross-lingual analyses on Wikipedia discussion corpora; prepare the exploration of cross-linguistic distributional semantics by training word embedding models on the French and German Wikipedias.