I can tell you that in private organizations worse things are done. There is no control. I would be less worried on a public transparent research project than what it is been done behind closed doors. And, does anybody know if any of the big tech companies have walked this path before? Or walking this path these days? Certainly, they would not been advertising this on any social media. Automatic psychological profiling from different angles have been around in the last decade across both public and private forums. And many schools, universities, and colleges discriminate on the grounds they deem fit, and nobody is pushing for shutting their business.
On Thursday, December 5, 2019, 9:46:00 AM GMT, Mike Scott <mike at lexically.net> wrote:
I would worry about any research project whose organisers chose to include "prediction of intellectual ability" in the very title. Presumably a careful choice for a big research project. When I see that this prediction is to be based on extremely short texts (however carefully collected), I think of face-recognition and its abuses, and of authoritarian regimes, and I worry that some time in the future our descendants will get labelled by scraps of text. Once you're labelled you do not easily get free from the label. The 19th/20th Century students of IQ thought they were doing pure science but the whole thing very soon got twisted and abused for all sorts of ends.
On 05/12/2019 07:58, Vidas Daudaravičius wrote:
Dear sheared task organizers,
The discussion is timely and important. My highest concerns are:
- Participants of any shared task need to decide whether to participate in or to discard shared task. The announcement of the shared task gives us to many ethical and organizational questions that are not explained: 2595 out of 14600 participant were selected. How they were selected? Does it produce bias? Probably, Yes. Do organizers have permissions from parents of high school students to collect data?
- And Yes, we are afraid of being ranked like in 1984 novel. It raises much more concerns than Native language Identification shared task. It is good to have discussions/proposals in advance for shared tasks that might have Ethical issues.
Explanations on Transparency, Privacy and Ethics issues would help participants and other interested researcher not to be so emotional and critical.
All the best with organizing shared task,
On 04/12/2019 15:08, Dirk Johann▀en wrote:
The data consists of a unique ID per entry, one ID per participant, of the applicants' major and high school grades as well as IQ scores with one textual expression attached to each entry. high school grades and IQ scores are z-standardized for privacy protection. In total there are 2,595 participants, who produced 77,850 unique MIX answers. The shortest textual answers consist of 3 words, the longest of 42 and on average there are roughly 15 words per textual answer with a standard deviation of 8 words.
The available data set has been collected and hand-labeled by researchers of the University of Trier. More than 14,600 volunteers participated in answering questions to 15 provided images. The pairwise annotator intraclass correlation was r = .85 on the Winter scale (Winter, 1994). The length of the answers ranges from 4 to 79 words with a mean length of 22 words and a standard deviation of roughly 12 words.
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
-- Mike Scott lexically.net Lexical Analysis Software and Aston University _______________________________________________ UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora Corpora mailing list Corpora at uib.no https://mailman.uib.no/listinfo/corpora -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 7114 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20191205/1bb13cc9/attachment.txt>