Final call for participation: 14th BUCC Workshop, Monday, Sept. 6, 2021 (Building and Using Comparable Corpora)

In conjunction with RANLP 2021 (online)

Monday, September 6, 2021

Workshop website: https://comparable.limsi.fr/bucc2021/

RANLP website: https://ranlp.org/ranlp2021


* Pushpak Bhattacharyya https://www.cse.iitb.ac.in/~pb/

* Tomas Mikolov https://ricaip.eu/home/responsible-research-and-innovation-strategy/research-teams/tomas-mikolov

* Sujith Ravi https://www.sravi.org/


The workshop will be held on Monday, Sept. 6, 2021 using Zoom. You can still register for it on the RANLP main conference website at https://ranlp.org/ranlp2021/fees.php . The registration fee for non-presenters is 15 Euros and there is no late registration surcharge.

Please find the workshop programme below. A formatted version of it will be posted on the workshop website (URL see above) by Sept. 4. The proceedings will follow by Sept. 5.


Programme 14th BUCC Workshop, Monday, Sept. 6, 2021 (subject to change)

All times are in UTC+0

For a time zone converter and time difference calculator, see e.g. https://www.timeanddate.com/worldclock/converter.html

Note that during summer time (which is applicable on Sept. 6) London is at UTC+1

8:00 - 8:05 Welcome

8:05 - 9:00

     Invited presentation

     Machine Translation in Low Resource Setting

     Pushpak Bhattacharyya

9:00 - 9:25

     EM Corpus: a comparable corpus for a less-resourced language pair


     Rudali Huidrom, Yves Lepage and Khogendra Khomdram

9:25 - 9:40 Coffee break

9:40 - 10:05

     Mining Bilingual Word Pairs from Comparable Corpus using Apache

     Spark Framework

     Sanjanasri JP, Vijay Krishna Menon, Soman KP andKrzysztof Wolk

10:05 - 10:30

     Employing Wikipedia as a resource for Named Entity Recognition in

     Morphologically complex under-resourced languages

     Aravind Krishnan, Stefan Ziehe, Franziska Pannach and

     Caroline Sporleder

10:30 - 10:55

     Semi-Automated Labeling of Requirement Datasets for Relation


     Jeremias Bohn, Jannik Fischbach, Martin Schmitt,Hinrich Schütze

     and Andreas Vogelsang

10:55 - 11:20

     A Dutch Dataset for Cross-lingual Multilabel Toxicity Detection

     Ben Burtenshaw and Mike Kestemont

11:20 - 12:10 Lunch break

12:10 - 13:05 Invited presentation

     Topic: Language modeling and AI

     Tomas Mikolov

13:05 - 13:30

     Syntax-aware Transformers for Neural Machine Translation:

     The Case of Text to Sign Gloss Translation

     Santiago Egea Gómez, Euan McGill and Horacio Saggion

13:30 - 13:55

     Effective Bitext Extraction from Comparable Corpora Using a

     Combination of Three Different Approaches

     Steinţór Steingrímsson, Pintu Lohar, Hrafn Loftsson and Andy Way

13:55 - 14:10 Coffee break

14:10 - 14:35

     Majority Voting with Bidirectional Pre-translation For Bitext


     Alexander G. Jones and Derry Tanti Wijaya

14:35 - 15:00

     Extracting IPA in Wiktionary: Experiments on Multilingual

     Syllabification and Stress Prediction

     Winston Wu and David Yarowsky

15:00 - 15:55 Invited presentation

     Title tba

     Sujith Ravi

15:55 - 16:00 Closing

