[Corpora-List] Ethical review of spoken corpus collection

Geoffrey Sampson grs2 at sussex.ac.uk
Mon Apr 11 21:31:04 CEST 2011

The contributions to this thread which I have seen so far take the line that the ethics rules being applied to speech corpora are unreasonably tight. In some respects they may be, but there is another side to the question. If one works with the "demographically sampled speech" section of the British National Corpus, for instance, even though some measures of anonymization were applied during corpus compilation one not infrequently comes across material that is rather damaging to identifiable individuals (often third parties who were not participants in the recorded conversations). I discuss this to some extent for instance in section 4.1 of the documentation of my CHRISTINE Corpus (www.grsampson.net/ChrisDoc.html), and others have picked the point up and discussed it further. Last time I was actively involved with the corpus linguistics scene, this kind of problem did not seem to be adequately recognized and addressed by the research community; yet, apart from the fact that we researchers should surely want to act like decent people, the law imposes constraints that linguists don't always seem to understand. (I am fairly sure, for instance, that some of the material I found in the BNC violates the European Data Protection Directive, though possibly not the deliberately weak implementation of it within UK law.) I don't doubt that many university bureaucrats are stupidly heavy-handed in the manner they go about controlling research, but there is a real issue here -- laissez-faire would not be a good idea.

Geoffrey Sampson

More information about the Corpora mailing list