[Corpora-List] Results: Shared task on Cross-Language Document Similarity

Serge Sharoff s.sharoff at leeds.ac.uk
Sun Aug 2 22:18:27 CEST 2015

Dear all,

in the context of our regular workshop on Building and Using Comparable Corpora we ran a shared task on identification of comparable corpora on the Web. More information about the task is available from: https://comparable.limsi.fr/bucc2015/bucc2015-task.html

The workshop proceedings with the participating systems and the results are now available from: http://www.aclweb.org/anthology/W/W15/W15-34.pdf

In order to promote further research on this topic, the gold-standard resources with a standardised train/test split have been made available to everyone: http://corpus.leeds.ac.uk/serge/BUCC/

Feel free to use this set for any tasks involving research of comparable corpora. The standard reference is: @InProceedings{sharoff-zweigenbaum-rapp:2015:BUCC,

author = {Sharoff, Serge and Zweigenbaum, Pierre and Rapp, Reinhard},

title = {BUCC Shared Task: Cross-Language Document Similarity},

booktitle = {Proceedings of the Eighth Workshop on Building and Using Comparable Corpora},

month = {July},

year = {2015},

address = {Beijing, China},

publisher = {Association for Computational Linguistics},

pages = {74--78},

url = {http://www.aclweb.org/anthology/W15-3411} }

Best wishes, Serge

More information about the Corpora mailing list