[Corpora-List] GermEval 2020 Task 1 on the Prediction of Intellectual Ability and Personality Traits from Text: 1st Call for Participation

Asad Sayeed asayeed at coli.uni-saarland.de
Thu Dec 5 21:32:09 CET 2019

Dear Detmar,

Like you, I am reluctant to completely shut the door on any particular area of inquiry. However, in most of the world (the German education system included), standardized testing and psychometrics are socially fraught and have a very negative history.  I can say that after nearly six years as a researcher in Germany (2011-2017), I know that very many people in German institutions are especially sensitive to the kinds of ideologies that psychometrics, particularly intelligence measurement, enables. So I was surprised as many people here to see a task proposed in a manner, if otherwise well-meaning, that seemed to gloss over the underlying ideological conflict issues with identifying the "language correlates of IQ test results." Researchers elsewhere, including those that have been revealed to have done so in industry, have been rightly called out ("burned"?) for e.g. using facial recognition for criminality detection.

I get the impression ("second hand concerns") that some people think that this is merely an ideological import of a conflict in the USA, but Germany and other European countries are not exempt from the same social phenomena and the same dangers of indirect inference of psychometric scores.  From personal and direct observation, the German educational system produces vastly unequal outcomes (e.g. disproportionately few "Gastarbeiter"-origin Germans in higher education) partly based on discriminatory processes that are laundered into objective criteria.  As I said, this is not different from many countries, but the general sensitivity to these issues in Germany in my experience is quite high because, yes, of a bad history resulting from ideologies that -- among other things -- inferred socially-relevant personal characteristics from superficial ones.  This is what I think surprises many of us about this situation.

Luckily I gather from the Twitter kerfuffle about this that the organizers of the task are going to clarify some of the issues there.  Yes, shared tasks on detecting IQ need to be handled with the utmost care and with respect to the concerns that people have raised about intelligence metrics and inference from personal data, and with explicit knowledge of the methodological flaws underpinning IQ research and the limitations of psychometrics in general. I do not believe that this is mere "moral incense" or a personal attack on the organizers.

Yours, --Asad.

On 2019-12-05 12:06 a.m., Detmar Meurers wrote:
> Dear colleagues,
> shouldn't we as scientists be a bit more cautious and civil than
> ridiculing colleagues for creating a shared task, citing anecdotal
> evidence and second hand concerns about the interpretation of
> psychological test results?
> Have we as a community really devolved to a level where researchers
> are publicly shamed for "daring" to look for language correlates of IQ
> test results - just because IQ tests, like many psychological and
> educational tests and (psycho)medical diagnoses that computational
> linguists are exploring language correlates of are connected to
> complex societal biases? Will organizers of shared tasks on Native
> Langauge Identification and Speaker ID, with their obvious potential
> for (mis)use, be burned next?
> If you're serious about societal bias in AI, there is an increasing
> number of events that you can get involved in and help make a real
> contribution to address this complex issue. Simply spewing opinions
> and moral incense into mailboxes seems to be much less beneficial.
> Best regards,
> Detmar
> On Wed, Dec 04, 2019 at 10:58:39PM +0100, Zeerak Waseem wrote:
>> Anecdotally from my own experiences in Denmark and conversations with
>> god knows how many racialised people around Europe (best estimation n
>> ~= 50): the concerns and reasons for concern described (and reasons
>> for strong language) are likely to map to a European context.
>> The racial inequalities of the societies play out in the same way (i
>> know Europeans hate admitting to similarities with the us but they
>> are highly similar) with marginalisation of groups of people based on
>> heritage (the specific group(s) depend on the country in question),
>> (forced) ghettoisation, White flight, schools and public
>> infrastructure being poorly maintained (anecdotally: I was told
>> multiple times along the way (directly and indirectly) that i should
>> probably not seek an academic career and I’ve rarely had a
>> conversation with a racialised person who grew up in the global
>> economic north who didn’t have similar experiences), etc. Of course
>> there are also dissimilar systems of thinking between cultures and
>> having multiple cultural backgrounds could influence scores resulting
>> from processing patterns measured in IQ tests.
>> In short: based on my own experience, what I saw growing up, and
>> other’s experiences, I would be highly surprised is European
>> investigations of correlations between race and IQ would show
>> dissimilarity with the US investigations. And these inequities would
>> likely be built into the systems relying on data such as IQ.
>> So yeah, seconding that the strong language may be entirely
>> appropriate (and frankly Europeans should stop thinking that the
>> processes of racial discrimination here are wildly different than in
>> the us).
>> Zeerak
>>> On 4 Dec 2019, at 22:10, Yannick Versley <yversley at gmail.com> wrote:
>>> Even understanding the German educational system and culture, I'd
>>> say that this task should light up the "irresponsible" light on the
>>> mind of any person who is (i) reasonably clear-thinking and (ii)
>>> familiar with the problems that responsible/ethical AI wants to warn
>>> us about. By overselling models and tasks that necessarily (i) lead
>>> to a poor fit overall and (ii) are prone to pick up
>>> cultural/ethnical background (to pick a less US-centric term than
>>> "race") in addition to any informative features, we're lending
>>> legitimacy to the use of similar tools that are used as a
>>> pseudo-scientific mantle to disguise (essentially) the automation of
>>> racial/ethnic/cultural discrimination and biases.
>>> People (yes, most people) desperately want automated computer
>>> decisions that work the same way that biased humans do them but with
>>> the aura of objectivity. And as the technological experts it's our
>>> duty to call bullshit on that and criticise the flawed tools as well
>>> as the processes that lead to their acceptance and/or use in
>>> production. Rather than capitalizing on the desire for
>>> pseudo-science and becoming a helping party in the deception.
>>> Best wishes,
>>> Yannick
>>>> On Wed, Dec 4, 2019 at 9:24 PM Laura Dietz <dietz at cs.unh.edu> wrote:
>>>> I think it is unfair to call this task out as "irresponsible"
>>>> without understanding the German educational system and culture. I
>>>> understand that Americans have a knee-jerk response, and it is
>>>> always good to caution experimental setups. However, I would have
>>>> hoped for a more measured response.
>>>> Laura Dietz
>>>>> On 12/4/19 2:00 PM, Emily M. Bender wrote:
>>>>> Thank you, Jacob, for this reply. This task seems
>>>>> irresponsible/poorly conceived to me. Before designing such a
>>>>> task, I think it is imperative to consider its use cases: When and
>>>>> why would we want to predict IQ scores or high school grades from
>>>>> text? Given the high potential for any such system to learn
>>>>> preexisting biases (themselves the result of structural
>>>>> discrimination in society), what are the likely impacts,
>>>>> especially on already marginalized populations?
>>>>> Emily
>>>>>> On Wed, Dec 4, 2019 at 10:34 AM Jacob Eisenstein
>>>>>> <jacobe at gmail.com> wrote:
>>>>>> As a community, we should think carefully about whether it is
>>>>>> appropriate to work with IQ test results as data, and what the
>>>>>> applications of this research might be.
>>>>>> In the United States, there is considerable evidence that IQ
>>>>>> tests are racially biased. In the past, courts have excluded IQ
>>>>>> tests from educational placement in California for precisely this
>>>>>> reason. I wonder if there is research on this topic in the German
>>>>>> context.
>>>>>> It is not difficult to imagine that the outcome of this shared
>>>>>> task would be a set of technologies that encode spurious
>>>>>> correlations between estimates of intelligence and the linguistic
>>>>>> features of specific racial groups. If such a system were trained
>>>>>> on data that already contains biases, there is a risk that this
>>>>>> bias would be not only entrenched but amplified. And even if the
>>>>>> IQ test statistics are not themselves biased, an NLP system that
>>>>>> predicts IQ from text could introduce bias, if there is an
>>>>>> unmeasured confound that is statistically associated with both IQ
>>>>>> and race.
>>>>>> I hope that these issues will receive serious consideration from
>>>>>> the organizers and participants in the task.
>>>>>> Jacob Eisenstein
>>>>>>> On Wed, Dec 4, 2019 at 8:27 AM Dirk Johannßen
>>>>>>> <johannssen at informatik.uni-hamburg.de> wrote:
>>>>>>> GermEval 2020 Task 1 on the Prediction of Intellectual Ability
>>>>>>> and Personality Traits from Text
>>>>>>> 1st Call for Participation
>>>>>>> We invite interested parties from academia and industry to
>>>>>>> participate in this shared task. Further information can be
>>>>>>> found here:
>>>>>>> https://www.inf.uni-hamburg.de/en/inst/ab/lt/resources/data/germeval-2020-psychopred.html
>>>>>>> .
>>>>>>> The validity of high school grades as a predictor of academic
>>>>>>> success is controversial. Researchers have found indications
>>>>>>> that linguistic features such as function words used in a
>>>>>>> prospective student's writing perform better in predicting
>>>>>>> academic success (Pennebaker et al., 2014).
>>>>>>> During an aptitude test, participants are asked to write freely
>>>>>>> associated texts to provided questions and images. Trained
>>>>>>> psychologists can predict behavior, long-term development, and
>>>>>>> subsequent success from those expressions. Paired with an IQ
>>>>>>> test and provided high school grades, prediction of intellectual
>>>>>>> ability from a text can be investigated. Such an approach would
>>>>>>> extend the sole text classification and could reveal insightful
>>>>>>> psychological traits.
>>>>>>> Operant motives are unconscious intrinsic desires that can be
>>>>>>> measured by implicit or operant methods, such as the Operant
>>>>>>> Motive Test (OMT) or the Motive Index (MIX) employs. During the
>>>>>>> OMT and MIX, participants are asked to write freely associated
>>>>>>> texts to provided questions and images. Trained psychologists
>>>>>>> label these textual answers with one of five motives and
>>>>>>> corresponding levels. The identified motives allow psychologists
>>>>>>> to predict behavior, longterm development, and subsequent
>>>>>>> success. For our task, we provide extensive amounts of textual
>>>>>>> data from both, the OMT and MIX, paired with IQ and high school
>>>>>>> grades (MIX) and labels (OMT).
>>>>>>> With this task, we aim to foster research within this context.
>>>>>>> This task is focusing on classifying German psychological text
>>>>>>> data for predicting the IQ and high school grades of college
>>>>>>> applicants as well as performing speaker identification by the
>>>>>>> same image descriptions.
>>>>>>> Tasks
>>>>>>> This shared task consists of two subtasks, described below.
>>>>>>> Participants are free to participate in either one of them or both.
>>>>>>> - Subtask 1: Prediction of Intellectual Ability. The task is to
>>>>>>> predict measures of intellectual ability solemnly based on text.
>>>>>>> For this, z-standardized high school grades and IQ scores of
>>>>>>> college applicants are summed and globally ranked. The goal of
>>>>>>> this subtask is to reproduce their ranking, systems are
>>>>>>> evaluated by the Pearson correlation coefficient between system
>>>>>>> and gold ranking.
>>>>>>> For the final results, participants of this shared task will be
>>>>>>> provided with an MIX_text only and are asked to reproduce the
>>>>>>> ranking of each student relative to all students in a collection
>>>>>>> (i.e. the within the test set).
>>>>>>> The data is delivered in two files, one containing participant
>>>>>>> data, the other containing sample data, each being connected by
>>>>>>> a student ID. The rank in the sample data reflects the averaged
>>>>>>> performance relative to all instances within the collection
>>>>>>> (i.e. within train / test / dev), which is to be reproduced for
>>>>>>> the task.
>>>>>>> - Subtask 2: Classification of the Operant Motive Test (OMT).
>>>>>>> Operant motives are unconscious intrinsic desires that can be
>>>>>>> measured by implicit or operant methods, such as the Operant
>>>>>>> Motive Test (OMT)(Kuhl and Scheffer, 1999). During the OMT,
>>>>>>> participants are asked to write freely associated texts to
>>>>>>> provided questions and images. An exemplary illustration can be
>>>>>>> found in the Data area. Trained psychologists label these
>>>>>>> textual answers with one of four motives. The identified motives
>>>>>>> allow psychologists to predict behavior, long-term development,
>>>>>>> and subsequent success.
>>>>>>> For this shared task, participants will be provided with an
>>>>>>> OMT_text and are asked to predict the motive and level of each
>>>>>>> instance. The success will be measured with the macro-averaged
>>>>>>> F1-score.
>>>>>>> Data
>>>>>>> Since 2011, the private university of applied sciences
>>>>>>> NORDAKADEMIE performs an aptitude college application test,
>>>>>>> where participants state their high school performance, perform
>>>>>>> an IQ test and a psychometrical test called the Motive Index
>>>>>>> (MIX). The MIX measures so-called implicit or operant motives by
>>>>>>> having participants answer questions to those images like the
>>>>>>> one displayed below such as "who is the main person and what is
>>>>>>> important for that person?" and "what is that person feeling".
>>>>>>> Furthermore, those participants answer the question of what
>>>>>>> motivated them to apply for the NORDAKADEMIE.
>>>>>>> The data consists of a unique ID per entry, one ID per
>>>>>>> participant, of the applicants' major and high school grades as
>>>>>>> well as IQ scores with one textual expression attached to each
>>>>>>> entry. high school grades and IQ scores are z-standardized for
>>>>>>> privacy protection. In total there are 2,595 participants, who
>>>>>>> produced 77,850 unique MIX answers. The shortest textual answers
>>>>>>> consist of 3 words, the longest of 42 and on average there are
>>>>>>> roughly 15 words per textual answer with a standard deviation of
>>>>>>> 8 words.
>>>>>>> The available data set has been collected and hand-labeled by
>>>>>>> researchers of the University of Trier. More than 14,600
>>>>>>> volunteers participated in answering questions to 15 provided
>>>>>>> images. The pairwise annotator intraclass correlation was r =
>>>>>>> .85 on the Winter scale (Winter, 1994). The length of the
>>>>>>> answers ranges from 4 to 79 words with a mean length of 22 words
>>>>>>> and a standard deviation of roughly 12 words.
>>>>>>> Submissions for the validation set via the Codalab page are
>>>>>>> accepted and published on a leaderboard from January 1st. From
>>>>>>> May 1st, we will start the final evaluation phase of the task by
>>>>>>> providing the gold labels of the validation set, which can be
>>>>>>> used as additional training data. Additionally, the test set
>>>>>>> samples will be provided, for which we accept submissions until
>>>>>>> June, 1st.
>>>>>>> More information can be found on the task's webpage:
>>>>>>> https://www.inf.uni-hamburg.de/en/inst/ab/lt/resources/data/germeval-2020-psychopred.html
>>>>>>> Important Dates
>>>>>>> - 01-Dec-2019: Release of trial data and systems
>>>>>>> - 01-Jan-2020: Release of training data (train + validation)
>>>>>>> - 08-May-2020: Release of test data
>>>>>>> - 01-Jun-2020: Final submission of test results
>>>>>>> - 03-Jun-2020: Submission of description paper
>>>>>>> - 04-11-Jun-2020: Peer reviewing: participants are expected to
>>>>>>> review other participant's system descriptions
>>>>>>> - 12-Jun-2020: Notification of acceptance and reviewer feedback
>>>>>>> - 18-Jun-2020: Camera-ready deadline for system description papers
>>>>>>> - 23-Jun-2020: Workshop in Zurich, Switzerland at the KONVENS
>>>>>>> 2020 and SwissText joint conference
>>>>>>> The shared task will be accompanied by a pre-conference workshop
>>>>>>> of the Conference on Natural Language Processing ("Konferenz zur
>>>>>>> Verarbeitung natürlicher Sprache", KONVENS) hosted on June 23,
>>>>>>> 2020, at Zürich (https://swisstext-and-konvens-2020.org/).
>>>>>>> Workshop Proceedings
>>>>>>> Description papers will appear in online workshop proceedings.
>>>>>>> Participants who submit a description paper will be asked to
>>>>>>> register at the workshop and present their system as a poster or
>>>>>>> in an oral presentation (depending on the number of submissions).
>>>>>>> Organizers
>>>>>>> The shared task is organized by Dirk Johannßen, Chris Biemann,
>>>>>>> Steffen Remus and Timo Baumann from the Language Technology
>>>>>>> group of the University of Hamburg
>>>>>>> (https://www.inf.uni-hamburg.de/en/inst/ab/lt/home.html), as
>>>>>>> well as David Scheffer from the NORDAKADEMIE Elmshorn, Nicola
>>>>>>> Baumann from the Universität Trier and the Gudula Ritz from the
>>>>>>> Impart GmbH (Germany).
>>>>>>> GermEval
>>>>>>> GermEval is a series of shared task evaluation campaigns that
>>>>>>> focus on Natural Language Processing for the German language.
>>>>>>> GermEval has been conducted four times since 2014 in co-location
>>>>>>> with KONVENS/GSCL conferences. For an overview of the currently
>>>>>>> conducted tasks, visit
>>>>>>> https://swisstext-and-konvens-2020.org/shared-tasks/.
>>>>>>> -- Dirk Johannßen
>>>>>>> Universität Hamburg
>>>>>>> Department of Informatics
>>>>>>> Language Technology Group (LT)
>>>>>>> Vogt-Kölln-Straße 30
>>>>>>> 22527 Hamburg
>>>>>>> Room: F-412
>>>>>>> johannssen at informatik.uni-hamburg.de
>>>>>>> http://lt.informatik.uni-hamburg.de
>>>>>>> http://www.uni-hamburg.de
>>>>>>> _______________________________________________
>>>>>>> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
>>>>>>> Corpora mailing list
>>>>>>> Corpora at uib.no
>>>>>>> https://mailman.uib.no/listinfo/corpora
>>>>>> _______________________________________________
>>>>>> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
>>>>>> Corpora mailing list
>>>>>> Corpora at uib.no
>>>>>> https://mailman.uib.no/listinfo/corpora
>>>>> -- Emily M. Bender (she/her)
>>>>> Howard and Frances Nostrand Endowed Professor
>>>>> Department of Linguistics
>>>>> Faculty Director, CLMS
>>>>> University of Washington
>>>>> Twitter: @emilymbender
>>>>> _______________________________________________
>>>>> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
>>>>> Corpora mailing list
>>>>> Corpora at uib.no
>>>>> https://mailman.uib.no/listinfo/corpora
>>>> _______________________________________________
>>>> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
>>>> Corpora mailing list
>>>> Corpora at uib.no
>>>> https://mailman.uib.no/listinfo/corpora
>>> _______________________________________________
>>> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
>>> Corpora mailing list
>>> Corpora at uib.no
>>> https://mailman.uib.no/listinfo/corpora
>> _______________________________________________
>> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
>> Corpora mailing list
>> Corpora at uib.no
>> https://mailman.uib.no/listinfo/corpora

More information about the Corpora mailing list