Emily
On Wed, Dec 4, 2019 at 10:34 AM Jacob Eisenstein <jacobe at gmail.com> wrote:
> As a community, we should think carefully about whether it is appropriate
> to work with IQ test results as data, and what the applications of this
> research might be.
>
> In the United States, there is considerable evidence that IQ tests are
> racially biased. In the past, courts have excluded IQ tests from
> educational placement in California for precisely this reason. I wonder if
> there is research on this topic in the German context.
>
> It is not difficult to imagine that the outcome of this shared task would
> be a set of technologies that encode spurious correlations between
> estimates of intelligence and the linguistic features of specific racial
> groups. If such a system were trained on data that already contains biases,
> there is a risk that this bias would be not only entrenched but amplified.
> And even if the IQ test statistics are not themselves biased, an NLP system
> that predicts IQ from text could introduce bias, if there is an unmeasured
> confound that is statistically associated with both IQ and race.
>
> I hope that these issues will receive serious consideration from the
> organizers and participants in the task.
>
> Jacob Eisenstein
>
> On Wed, Dec 4, 2019 at 8:27 AM Dirk Johannßen <
> johannssen at informatik.uni-hamburg.de> wrote:
>
>> *GermEval 2020 Task 1 on the Prediction of Intellectual Ability and
>> Personality Traits from Text*
>>
>> *1st Call for Participation*
>> We invite interested parties from academia and industry to participate in
>> this shared task. Further information can be found here:
>> https://www.inf.uni-hamburg.de/en/inst/ab/lt/resources/data/germeval-2020-psychopred.html
>> .
>>
>> The validity of high school grades as a predictor of academic success is
>> controversial. Researchers have found indications that linguistic features
>> such as function words used in a prospective student's writing perform
>> better in predicting academic success (Pennebaker et al., 2014).
>>
>> During an aptitude test, participants are asked to write freely
>> associated texts to provided questions and images. Trained psychologists
>> can predict behavior, long-term development, and subsequent success from
>> those expressions. Paired with an IQ test and provided high school grades,
>> prediction of intellectual ability from a text can be investigated. Such an
>> approach would extend the sole text classification and could reveal
>> insightful psychological traits.
>>
>> Operant motives are unconscious intrinsic desires that can be measured by
>> implicit or operant methods, such as the Operant Motive Test (OMT) or the
>> Motive Index (MIX) employs. During the OMT and MIX, participants are asked
>> to write freely associated texts to provided questions and images. Trained
>> psychologists label these textual answers with one of five motives and
>> corresponding levels. The identified motives allow psychologists to predict
>> behavior, longterm development, and subsequent success. For our task, we
>> provide extensive amounts of textual data from both, the OMT and MIX,
>> paired with IQ and high school grades (MIX) and labels (OMT).
>>
>> With this task, we aim to foster research within this context. This task
>> is focusing on classifying German psychological text data for predicting
>> the IQ and high school grades of college applicants as well as performing
>> speaker identification by the same image descriptions.
>>
>>
>> *Tasks*
>> This shared task consists of two subtasks, described below. Participants
>> are free to participate in either one of them or both.
>>
>> *- Subtask 1*: Prediction of Intellectual Ability. The task is to
>> predict measures of intellectual ability solemnly based on text. For this,
>> z-standardized high school grades and IQ scores of college applicants are
>> summed and globally ranked. The goal of this subtask is to reproduce their
>> ranking, systems are evaluated by the Pearson correlation coefficient
>> between system and gold ranking.
>>
>> For the final results, participants of this shared task will be provided
>> with an MIX_text only and are asked to reproduce the ranking of each
>> student relative to all students in a collection (i.e. the within the test
>> set).
>>
>> The data is delivered in two files, one containing participant data, the
>> other containing sample data, each being connected by a student ID. The
>> rank in the sample data reflects the averaged performance relative to all
>> instances within the collection (i.e. within train / test / dev), which is
>> to be reproduced for the task.
>>
>> *- Subtask 2*: Classification of the Operant Motive Test (OMT). Operant
>> motives are unconscious intrinsic desires that can be measured by implicit
>> or operant methods, such as the Operant Motive Test (OMT)(Kuhl and
>> Scheffer, 1999). During the OMT, participants are asked to write freely
>> associated texts to provided questions and images. An exemplary
>> illustration can be found in the Data area. Trained psychologists label
>> these textual answers with one of four motives. The identified motives
>> allow psychologists to predict behavior, long-term development, and
>> subsequent success.
>>
>> For this shared task, participants will be provided with an OMT_text and
>> are asked to predict the motive and level of each instance. The success
>> will be measured with the macro-averaged F1-score.
>>
>>
>> *Data*
>> Since 2011, the private university of applied sciences NORDAKADEMIE
>> performs an aptitude college application test, where participants state
>> their high school performance, perform an IQ test and a psychometrical test
>> called the Motive Index (MIX). The MIX measures so-called implicit or
>> operant motives by having participants answer questions to those images
>> like the one displayed below such as "who is the main person and what is
>> important for that person?" and "what is that person feeling". Furthermore,
>> those participants answer the question of what motivated them to apply for
>> the NORDAKADEMIE.
>>
>> The data consists of a unique ID per entry, one ID per participant, of
>> the applicants' major and high school grades as well as IQ scores with one
>> textual expression attached to each entry. high school grades and IQ scores
>> are z-standardized for privacy protection. In total there are 2,595
>> participants, who produced 77,850 unique MIX answers. The shortest textual
>> answers consist of 3 words, the longest of 42 and on average there are
>> roughly 15 words per textual answer with a standard deviation of 8 words.
>>
>> The available data set has been collected and hand-labeled by researchers
>> of the University of Trier. More than 14,600 volunteers participated in
>> answering questions to 15 provided images. The pairwise annotator
>> intraclass correlation was r = .85 on the Winter scale (Winter, 1994). The
>> length of the answers ranges from 4 to 79 words with a mean length of 22
>> words and a standard deviation of roughly 12 words.
>>
>> Submissions for the validation set via the Codalab page are accepted and
>> published on a leaderboard from January 1st. From May 1st, we will start
>> the final evaluation phase of the task by providing the gold labels of the
>> validation set, which can be used as additional training data.
>> Additionally, the test set samples will be provided, for which we accept
>> submissions until June, 1st.
>>
>> More information can be found on the task's webpage:
>> https://www.inf.uni-hamburg.de/en/inst/ab/lt/resources/data/germeval-2020-psychopred.html
>>
>>
>> *Important Dates*
>> - 01-Dec-2019: Release of trial data and systems
>> - 01-Jan-2020: Release of training data (train + validation)
>> - 08-May-2020: Release of test data
>> - 01-Jun-2020: Final submission of test results
>> - 03-Jun-2020: Submission of description paper
>> - 04-11-Jun-2020: Peer reviewing: participants are expected to review
>> other participant's system descriptions
>> - 12-Jun-2020: Notification of acceptance and reviewer feedback
>> - 18-Jun-2020: Camera-ready deadline for system description papers
>> - 23-Jun-2020: Workshop in Zurich, Switzerland at the KONVENS 2020 and
>> SwissText joint conference
>>
>> The shared task will be accompanied by a pre-conference workshop of the
>> Conference on Natural Language Processing ("Konferenz zur Verarbeitung
>> natürlicher Sprache", KONVENS) hosted on June 23, 2020, at Zürich (
>> https://swisstext-and-konvens-2020.org/).
>>
>>
>> *Workshop Proceedings*
>> Description papers will appear in online workshop proceedings.
>> Participants who submit a description paper will be asked to register at
>> the workshop and present their system as a poster or in an oral
>> presentation (depending on the number of submissions).
>>
>>
>> *Organizers*
>> The shared task is organized by Dirk Johannßen, Chris Biemann, Steffen
>> Remus and Timo Baumann from the Language Technology group of the University
>> of Hamburg (https://www.inf.uni-hamburg.de/en/inst/ab/lt/home.html), as
>> well as David Scheffer from the NORDAKADEMIE Elmshorn, Nicola Baumann from
>> the Universität Trier and the Gudula Ritz from the Impart GmbH (Germany).
>>
>>
>> *GermEval*
>> GermEval is a series of shared task evaluation campaigns that focus on
>> Natural Language Processing for the German language. GermEval has been
>> conducted four times since 2014 in co-location with KONVENS/GSCL
>> conferences. For an overview of the currently conducted tasks, visit
>> https://swisstext-and-konvens-2020.org/shared-tasks/.
>>
>>
>>
>> --
>> Dirk Johannßen
>> Universität Hamburg
>> Department of Informatics
>> Language Technology Group (LT)
>> Vogt-Kölln-Straße 30
>> 22527 Hamburg
>>
>> Room: F-412
>>
>> johannssen at informatik.uni-hamburg.de
>> http://lt.informatik.uni-hamburg.de
>> http://www.uni-hamburg.de
>>
>> _______________________________________________
>> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
>> Corpora mailing list
>> Corpora at uib.no
>> https://mailman.uib.no/listinfo/corpora
>>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> https://mailman.uib.no/listinfo/corpora
>
-- Emily M. Bender (she/her) Howard and Frances Nostrand Endowed Professor Department of Linguistics Faculty Director, CLMS University of Washington Twitter: @emilymbender -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 13929 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20191204/fa131937/attachment.txt>