[Corpora-List] Call for participation. IberRDI challenge

Jesús Cid Sueiro jcid at ing.uc3m.es
Sun Mar 15 11:10:37 CET 2020

*IberRDI <http://iberrdi.webs.tsc.uc3m.es/>**challenge: Call for participation*.

Your are invited to participate at IberRDI <http://iberrdi.webs.tsc.uc3m.es/>, a task targeting the evaluation of content similarity graphs describing three corpora related to the research, development and innovation (RDI) production in Spanish language: projects proposals, scientific papers and patent applications. The task is part of the IberLEF <https://sites.google.com/view/iberlef2020/> (Iberian Languages Evaluation Forum) 2020 evaluation campaign, at the SEPLN 2020 <http://sepln2020.sepln.org/index.php/en/home/> Congress.

*Task Description:** ***

The dataset is gathered from open public data sources on Health Sciences innovation area. It consists of three document corpora, that can be downloaded here <http://iberrdi.webs.tsc.uc3m.es/download-datasets/> after registration in the task.

* *Projects*: a corpus of innovative granted projects from Health

Sciences taken from ISCIII

<https://portalfis.isciii.es/es/Paginas/Busqueda.aspx> (spanish

equivalent to US NIH organism) , funded by FIS (Fondo de

Investigación en Salud).

* *Publications*: a corpus of scientific publications taken from

Scielo, a collection of Ibero-american journals about Health Sciences.

* P*atents*: a corpus of granted patent applications, taken from the

Spanish patent office (OEMP, Oficina Española de Patentes y Marcas)

web service


Each corpus contains title and abstract of ~3000 documents. Each corpus is split in two sub-collections, one for training and the other one for the evaluation of the subtasks.

The participants should compute similarity graphs providing a similarity value between any pair of documents from the corpora. The evaluation of the subtasks will be based on the comparison of these similarity graphs and a set of reference graphs. We have computed reference graphs using metadata information available on the original corpora.

*Timeline** ***

* ** Release of reference graphs: *March., 12th,  2020*

* Result submission due: *May, 1st, 2020*

* Publication of results: *May, 15th,  2020*

* System-description paper submission: *June, 20th, 2020*

* Presentation of results at SEPLN 2020, *September, 23rd to 25th

2020*, Málaga, Spain

For further information and updates, please check:



* *David Pérez-Fernández* – Coordinator of the Spanish Language

Technologies Plan (Plan TL), Secretariat of State for Digital

Advancement (SEAD), Ministry of Economy, Spain

* *Jesús Cid-Sueiro* – Universidad Carlos III de Madrid, Spain

* *Jerónimo Arenas-García* – Universidad Carlos III de Madrid, Spain

* *Jorge Pereira Delgado* – Universidad Carlos III de Madrid, Spain

* *Simón Roca Sotelo* – Universidad Carlos III de Madrid, Spain

* *Doaa Samy* – Spanish Language Technologies Plan (Plan TL) &

Instituto de Ingeniería del Conocimiento, Spain

* *Joseba Sanmartín-Sola* – Fundación Española para la Ciencia y la

Tecnología, Spain

-- Jesus Cid-Sueiro Ph: 3491-6249174 Desp. 4.2.D03, EPS, Fax: 3491-6248749 Universidad Carlos III de Madrid Avda. de la Universidad, 30, 28911 Leganes, Madrid, Spain

-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 4856 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20200315/85a790ea/attachment.txt>

More information about the Corpora mailing list