I would worry about any research project whose organisers chose to include "prediction of intellectual ability" in the very title. Presumably a careful choice for a big research project. When I see that this prediction is to be based on extremely short texts (however carefully collected), I think of face-recognition and its abuses, and of authoritarian regimes, and I worry that some time in the future our descendants will get labelled by scraps of text. Once you're labelled you do not easily get free from the label. The 19th/20th Century students of IQ thought they were doing pure science but the whole thing very soon got twisted and abused for all sorts of ends.
On 05/12/2019 07:58, Vidas Daudaravičius wrote:
> Dear sheared task organizers,
> The discussion is timely and important. My highest concerns are:
> - Participants of any shared task need to decide whether to
> participate in or to discard shared task. The announcement of the
> shared task gives us to many ethical and organizational questions that
> are not explained: 2595 out of 14600 participant were selected. How
> they were selected? Does it produce bias? Probably, Yes. Do organizers
> have permissions from parents of high school students to collect data?
> - And Yes, we are afraid of being ranked like in 1984 novel. It raises
> much more concerns than Native language Identification shared task. It
> is good to have discussions/proposals in advance for shared tasks that
> might have Ethical issues.
> Explanations on Transparency, Privacy and Ethics issues would help
> participants and other interested researcher not to be so emotional
> and critical.
> All the best with organizing shared task,
> Vidas Daudaravicius
> On 04/12/2019 15:08, Dirk Johann▀en wrote:
>> The data consists of a unique ID per entry, one ID per participant,
>> of the applicants' major and high school grades as well as IQ scores
>> with one textual expression attached to each entry. high school
>> grades and IQ scores are z-standardized for privacy protection. In
>> total there are 2,595 participants, who produced 77,850 unique MIX
>> answers. The shortest textual answers consist of 3 words, the longest
>> of 42 and on average there are roughly 15 words per textual answer
>> with a standard deviation of 8 words.
>> The available data set has been collected and hand-labeled by
>> researchers of the University of Trier. More than 14,600 volunteers
>> participated in answering questions to 15 provided images. The
>> pairwise annotator intraclass correlation was r = .85 on the Winter
>> scale (Winter, 1994). The length of the answers ranges from 4 to 79
>> words with a mean length of 22 words and a standard deviation of
>> roughly 12 words.
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
-- Mike Scott lexically.net Lexical Analysis Software and Aston University
-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 4394 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20191205/a23d3ac8/attachment.txt>