[Corpora-List] GermEval 2020 Task 1 on the Prediction of Intellectual Ability and Personality Traits from Text: 1st Call for Participation

David Howcroft dave.howcroft at gmail.com
Thu Dec 5 17:28:18 CET 2019


Hi all,

I am extremely grateful to those who have raised the obvious issues with this workshop: technology is being increasingly used to launder bias and it's important for us to think critically about tasks of this nature and how they will be perceived by the wider public. Indeed, we cannot limit our concerns to the wider public, because narrow segments of the public exploit vague or inappropriately operationalized research claims to advance racism and sexism.

That said, I do think it's important to be very clear that the problem is with this research topic and how it is being presented rather than where it happens to be taking place. Some of us are naturally surprised that colleagues working in Germany who we hold in high regard would think this is a good idea, because we know that the German education system particularly emphasizes how systems of classifying humans can be abused. It is important, however, to separate that surprise from any implication that doing this research in Germany *in particular* is especially concerning.

I don't think that is right. We should be wary of tasks like this wherever in the world they take place.

Peace, Dave ---- David M. Howcroft Research Associate School of Mathematical and Computer Sciences Heriot-Watt University

https://www.davehowcroft.com

On Thu, Dec 5, 2019 at 3:03 PM Zoltan Boka <zoltan.boka at gmail.com> wrote:


> I have an IRB background and share the concerns here ; I’d argue that if
> anything they’re heightened in the European and specifically German context
> given history.
>
> The intersection between prejudice, laziness and a desire to offload human
> decision making to an algorithm, presumably so we can wash our hands and
> say “I’m not saying group X is uneducable, the School 2.0 software is
> saying it”, is a place rife with bad outcomes.
>
> We have a responsibility to decide whether to engage with projects,
> whether to speak out about them and how we can influence this trajectory
> which I fear is picking up speed.
>
> Sent from my iPhone
>
> On Dec 5, 2019, at 04:38, Mike Scott <mike at lexically.net> wrote:
>
> Dear All
>
> I would worry about any research project whose organisers chose to include
> "prediction of intellectual ability" in the very title. Presumably a
> careful choice for a big research project. When I see that this prediction
> is to be based on extremely short texts (however carefully collected), I
> think of face-recognition and its abuses, and of authoritarian regimes, and
> I worry that some time in the future our descendants will get labelled by
> scraps of text. Once you're labelled you do not easily get free from the
> label. The 19th/20th Century students of IQ thought they were doing pure
> science but the whole thing very soon got twisted and abused for all sorts
> of ends.
>
> Mike
> On 05/12/2019 07:58, Vidas Daudaravičius wrote:
>
> Dear sheared task organizers,
>
> The discussion is timely and important. My highest concerns are:
> - Participants of any shared task need to decide whether to participate in
> or to discard shared task. The announcement of the shared task gives us to
> many ethical and organizational questions that are not explained: 2595 out
> of 14600 participant were selected. How they were selected? Does it produce
> bias? Probably, Yes. Do organizers have permissions from parents of high
> school students to collect data?
> - And Yes, we are afraid of being ranked like in 1984 novel. It raises
> much more concerns than Native language Identification shared task. It is
> good to have discussions/proposals in advance for shared tasks that might
> have Ethical issues.
> Explanations on Transparency, Privacy and Ethics issues would help
> participants and other interested researcher not to be so emotional and
> critical.
>
> All the best with organizing shared task,
>
> Vidas Daudaravicius
>
>
> On 04/12/2019 15:08, Dirk Johann▀en wrote:
>
> The data consists of a unique ID per entry, one ID per participant, of the
> applicants' major and high school grades as well as IQ scores with one
> textual expression attached to each entry. high school grades and IQ scores
> are z-standardized for privacy protection. In total there are 2,595
> participants, who produced 77,850 unique MIX answers. The shortest textual
> answers consist of 3 words, the longest of 42 and on average there are
> roughly 15 words per textual answer with a standard deviation of 8 words.
>
> The available data set has been collected and hand-labeled by researchers
> of the University of Trier. More than 14,600 volunteers participated in
> answering questions to 15 provided images. The pairwise annotator
> intraclass correlation was r = .85 on the Winter scale (Winter, 1994). The
> length of the answers ranges from 4 to 79 words with a mean length of 22
> words and a standard deviation of roughly 12 words.
>
>
>
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> https://mailman.uib.no/listinfo/corpora
>
> --
> Mike Scottlexically.net
> Lexical Analysis Software and Aston University
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> https://mailman.uib.no/listinfo/corpora
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> https://mailman.uib.no/listinfo/corpora
>
-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 8105 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20191205/711577cf/attachment.txt>



More information about the Corpora mailing list