[Corpora-List] Final Call for Participation: French-German Colloquium WikiCorp 2018

Harald Lüngen luengen at ids-mannheim.de
Tue Jun 26 18:15:57 CEST 2018

Please note our final Call for Participation for the French-German Colloquium on Fostering linguistic studies on Wikipedia discussions - Multilingual corpus building, annotation and exploration tools to take place at the Université Nice, Côte d'Azur, France, 9-10 July 2018. Apologies for cross-posting.



French-German Colloquium WikiCorp 2018

Fostering linguistic studies on Wikipedia discussions.

Multilingual corpus building, annotation and exploration tools

Two-day colloquium at Université Nice Côte d'Azur (FR) July 9-10, 2018

Invited speakers

David Laniado, Eurecat Barcelona Torsten Zesch, Universität Duisburg-Essen

WikiCorp 2018 Website: http://www1.ids-mannheim.de/kl/nice2018


If you would like to participate, please fill in the registration form at http://www1.ids-mannheim.de/fileadmin/kl/nizza/registration_form.pdf and send it until 2 July 2018 via email to Celine.Poudat at unice.fr


Céline Poudat (Université Nice Côte d'Azur) Angelika Storrer (Universität Mannheim) Harald Lüngen (Institut für Deutsche Sprache, Mannheim) Laura Herzberg (Universität Mannheim)

Local organisation: Céline Poudat, Daria Ciałoń, Magali Guaresi and BCL team in Nice.

Funding: Huma-Num CORLI consortium

Confirmed Participants (last updated 2018-05-02)

Natalia Grabar, STL, Université Lille 3 Laura Herzberg, Universität Mannheim Mai Ho-Dac, CLLE-ERSS, Université Toulouse Marc Kupietz, Institut für Deutsche Sprache, Mannheim David Laniado, Eurecat, Barcelona Harald Lüngen, Institut für Deutsche Sprache, Mannheim Christophe Parisse, Head of Ortolang, MoDyCO, Université Paris X-Nanterre Céline Poudat, BCL, Université Côte d’Azur Angelika Storrer, Universität Mannheim Serena Villata, Wimmics, Université Côte d’Azur Torsten Zesch, Universität Duisburg-Essen

Location: Campus Saint-Jean-d’Angely 3, MSHS building, Salle Plate.

PRELIMINARY SCHEDULE (last updated 2018-05-29)

Monday, 9 July 2018

9:30-10:00 Opening

10:00-12:00 Section I: Joint corpus building, standards, and tools

► Short presentations on Features of the French and

German Wikipedia corpora, From Wiki dump and wiki text

to TEI, EuReCo idea and state of affairs

► Discussion and documentation of desiderata and requirements

12:00 - 13:30 Lunch

13:30-16:00 Section II: Linguistic Analyses of social interaction and conflicts

Invited talk: Torsten Zesch: Annotating, Detecting, and

Understanding Stance in Computer-Mediated Debates

► Short presentations on annotation categories for

linguistic analysis of interaction patterns, conflict

analysis, conflict detection

► Discussion and documentation of desiderata and


16:00-16:30 Coffee

16:30 - 17:30 Breakout Session - Breakout Session -

ad Section I(a) ad Section II

20:00 Dinner

Tuesday, 10 July 2018

9:30-10:00 Documentation of the Results of the two Breakout Sessions

from Day 1 10:00-12:30 Section III: Corpus analysis methods

Invited talk: David Laniado:

Visualisation of Wikipedia Interactions (working title)

► Short Presentations on Exploring French and German

Wikipedia discussion corpora using Hyperbase/Textométrie

and KorAP, Visualisation of word usage histories using

word embeddings

► Discussion and documentation of desiderata and


12:30- 14:00 Lunch 14.00-15:00 Breakout Session - Breakout Session -

ad Section III ad Section I (b)

15:00-15:30 Documentation of the results of the two parallel Breakout


15.30-16:00 Coffee

16:00-17:30 ► Planning the post-conference publication.

► Planning the implementation of results, follow-up

activities, projects, and further co-operation

► Wrap-up of the colloquium


Wikipedia is one of the most successful projects of the Web 2.0. Since its launch in 2001, thousands of contributors have built this huge knowledge resource, which is not only used as an online encyclopedia, but also as an object of research in many academic disciplines. It also constitutes a rich and unique resource for linguistic studies, first of all because of its multilinguality, and secondly because of its huge discussion spaces, in which the collaborative writing effort is negotiated. These so-called talk pages can be used as big corpus resources of Computer-Mediated Communication (CMC).

The French and German participants of the colloquium are part of an initiative which aims to foster linguistic studies on Wikipedia, providing recommendations for the building of Wikipedia standardized corpora, methods for their linguistic processing and exploration, and descriptors and annotations for the analysis of talk pages. The French-German team of proposers started co-operating in 2016 with a first workshop in Mannheim entitled “Wikipedia: Discourse and corpus linguistic perspectives”. Since then, the proposers and other participants have co-operated in various constellations on conferences, for joint publications and proposals. The group is now ready to prepare the ground for jointly building comparable French-German corpora to be used in cross-lingual, corpus-based analyses of Wikipedia discussions.

Up to now, most linguistic studies on Wikipedia are focused on the article pages, and do not go into a deep analysis of the linguistic features used in the discussion spaces. This may be due to three reasons: (i) Wikipedia is quite a complex object that linguists have difficulties to manipulate; (ii) Wikipedia interactions need specific descriptors and ad hoc annotations for analysis; and (iii) existing corpus technologies and exploration tools need to be adjusted to the specificities of CMC corpora in general and Wikipedia corpora in particular. More sophisticated tools and methods for the linguistic annotation and corpus exploration are needed to better exploit the huge and valuable corpus resources that can be constructed from Wikipedia discussions.

The colloquium will bring together researchers that have solid experience with preparing monolingual (French and German) corpora from Wikipedia, with their dissemination and providing corpus technology for their analysis, and with conducting linguistic research on social interaction in Wikipedia discussions with a particular interest on the analysis and detection of conflicts.

Goals of the colloquium The colloquium is committed to the long-term goal of building comparable French-German discussion corpora as a special type of big CMC corpora using TEI-compliant standards. These shall serve as a basis to further develop common tools and methods for the cross-lingual, corpus-based analyses of interaction, politeness and conflict.

Objectives concerning corpus building, standards, and tools: Harmonize the parameters of the so far separate French and German Wikipedia corpus building processes in order to make them interoperable for D-F contrastive and cross-lingual analyses: further develop the standards of the TEI CMC SIG; align metadata categories and value taxonomies.

Objectives concerning interaction analyses: Develop annotation categories for interaction patterns, politeness cues, and conflict analysis, joint representation of conflict structures.

Objectives concerning corpus analysis methods: Develop and adapt corpus-linguistic methods from KorAP and Textométrie to explore and visualize cross-lingual analyses on Wikipedia discussion corpora; prepare the exploration of cross-linguistic distributional semantics by training word embedding models on the French and German Wikipedias.

More information about the Corpora mailing list