The intersection between prejudice, laziness and a desire to offload human decision making to an algorithm, presumably so we can wash our hands and say “I’m not saying group X is uneducable, the School 2.0 software is saying it”, is a place rife with bad outcomes.
We have a responsibility to decide whether to engage with projects, whether to speak out about them and how we can influence this trajectory which I fear is picking up speed.
Sent from my iPhone
> On Dec 5, 2019, at 04:38, Mike Scott <mike at lexically.net> wrote:
> Dear All
> I would worry about any research project whose organisers chose to include "prediction of intellectual ability" in the very title. Presumably a careful choice for a big research project. When I see that this prediction is to be based on extremely short texts (however carefully collected), I think of face-recognition and its abuses, and of authoritarian regimes, and I worry that some time in the future our descendants will get labelled by scraps of text. Once you're labelled you do not easily get free from the label. The 19th/20th Century students of IQ thought they were doing pure science but the whole thing very soon got twisted and abused for all sorts of ends.
>> On 05/12/2019 07:58, Vidas Daudaravičius wrote:
>> Dear sheared task organizers,
>> The discussion is timely and important. My highest concerns are:
>> - Participants of any shared task need to decide whether to participate in or to discard shared task. The announcement of the shared task gives us to many ethical and organizational questions that are not explained: 2595 out of 14600 participant were selected. How they were selected? Does it produce bias? Probably, Yes. Do organizers have permissions from parents of high school students to collect data?
>> - And Yes, we are afraid of being ranked like in 1984 novel. It raises much more concerns than Native language Identification shared task. It is good to have discussions/proposals in advance for shared tasks that might have Ethical issues.
>> Explanations on Transparency, Privacy and Ethics issues would help participants and other interested researcher not to be so emotional and critical.
>> All the best with organizing shared task,
>> Vidas Daudaravicius
>>> On 04/12/2019 15:08, Dirk Johann▀en wrote:
>>> The data consists of a unique ID per entry, one ID per participant, of the applicants' major and high school grades as well as IQ scores with one textual expression attached to each entry. high school grades and IQ scores are z-standardized for privacy protection. In total there are 2,595 participants, who produced 77,850 unique MIX answers. The shortest textual answers consist of 3 words, the longest of 42 and on average there are roughly 15 words per textual answer with a standard deviation of 8 words.
>>> The available data set has been collected and hand-labeled by researchers of the University of Trier. More than 14,600 volunteers participated in answering questions to 15 provided images. The pairwise annotator intraclass correlation was r = .85 on the Winter scale (Winter, 1994). The length of the answers ranges from 4 to 79 words with a mean length of 22 words and a standard deviation of roughly 12 words.
>> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
>> Corpora mailing list
>> Corpora at uib.no
> Mike Scott
> Lexical Analysis Software and Aston University
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 5874 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20191205/c93d45e6/attachment.txt>