[Corpora-List] [PoliticEs at IberLEF2022] PoliticEs: Spanish Author Profiling for Political Ideology - Training set released!

Salud María Jiménez Zafra sjzafra at ujaen.es
Mon Mar 14 12:30:51 CET 2022


Training set released!


IberLEF 2022 Task - PoliticEs. Spanish Author Profiling for Political Ideology

Held as part of the evaluation forum IberLEF <https://sites.google.com/view/iberlef2022> in the XXXVIII edition of the International Conference of the Spanish Society for Natural Language Processing (SEPLN 2022 <https://sepln2022.grupolys.org/>)

September 20, 2022. A Coruña, Spain

Codalab link: https://codalab.lisn.upsaclay.fr/competitions/1948

Dear All,

We are inviting researchers and students to participate in the shared-task PoliticEs: Spanish Author Profiling for Political Ideology held as part of the evaluation forum IberLEF, collocated with SEPLN 2022 Conference.

The goal of this task is to extract the political ideology of a user from a given set of tweets. It focuses on identifying gender and profession as demographic traits, and identifying the political spectrum from a binary and multi-class perspective as a psychographic trait. To the best of our knowledge, this is the first Spanish shared task focused on extracting political ideology from a text collection.

The participants will be provided development, development_test, training and test datasets in Spanish from an extension of the PoliCorpus 2020 (García-Díaz et al., 2022). The dataset was collected during 2020 and 2021 from the Twitter accounts of politicians and political journalists in Spain using the UMUCorpusClassifier (García-Díaz et al., 2020). It is composed of around 400 different users with at least 120 tweets. Each author is annotated with his or her gender (male, female) and profession (journalist, politician), and with the political spectrum on two axes: binary (left, right) and multiclass (left, moderate_left, moderate_right, right).

Moreover, in order to facilitate participation in the competition, a notebook with a baseline based on BoW will be provided. To download the data, the notebook and participate, go to https://codalab.lisn.upsaclay.fr/competitions/1948.

Today, we have released the training dataset that can be found in the "Files" subsection of the "Participate" tab. It is worth mentioning that this dataset includes all the instances that were also released during the Practice stage; so, it is not needed to combine both datasets.

Finally, remember that the CodaLab competition is open to submit your results with the development dataset provided. This dataset is also available in the same section as the training dataset.

Best regards,

The organizing committee



García-Díaz, J. A., Almela, Á., Alcaraz-Mármol, G., & Valencia-García,

R. (2020). UMUCorpusClassifier: Compilation and evaluation of linguistic

corpus for Natural Language Processing tasks. Procesamiento del Lenguaje

Natural, 65, 139-142.


García-Díaz, J. A., Colomo-Palacios, R., & Valencia-García, R. (2022).

Psychographic traits identification based on political ideology: An author

analysis study on Spanish politicians’ tweets posted in 2020. Future

Generation Computer Systems, 130(1), 59-74.

Important dates


Release of development corpora: Feb 14, 2022


Release of training corpora: Mar 14, 2022


Release of test corpora and start of evaluation campaign: Apr 18, 2022


End of evaluation campaign (deadline for runs submission): May 4, 2022


Publication of official results: May 6, 2022


Paper submission: May 29, 2022


Review notification: Jun 17, 2022


Camera ready submission: Jun 28, 2022


IberLEF Workshop (SEPLN 2022): Sep 20, 2022

Organizing committee


José Antonio García-Díaz (TECNOMOD, Universidad de Murcia)


Salud María Jiménez-Zafra (SINAI, Universidad de Jaén)


María-Teresa Martín Valdivia (SINAI, Universidad de Jaén)


Francisco García-Sánchez (TECNOMOD, Universidad de Murcia)


L. Alfonso Ureña-López (SINAI, Universidad de Jaén)


Rafael Valencia-García (TECNOMOD, Universidad de Murcia)


[image: Universidad de Jaén] <http://www.uja.es/> *Salud María Jiménez Zafra* sjzafra at ujaen.es

Universidad de Jaén Grupo de Investigación SINAI <http://sinai.ujaen.es/> | Departamento de Informática EPS Jaén, Edificio A3, Despacho 219 Campus Las Lagunillas s/n 23071 - Jaén | +34 953212992

[image: Universidad de Jaén] <http://www.uja.es/> -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 24819 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20220314/6c30cb32/attachment.txt>

More information about the Corpora mailing list