[Corpora-List] offer of research resource

Martin Wynne martin.wynne at oucs.ox.ac.uk
Wed Jun 28 11:07:01 CEST 2006

Dear Geoffrey and everyone,

I've had several messages offline asking why the OTA doesn't offer to
take this resource, so before anyone else asks, I should point out that
the Oxford Text Archive and the Arts and Humanities Data Service only
archive electronic resources, and so, unfortunately, would not be able
to offer a home for this valuable data in its current state. As I
understand it, what is needed is a traditional archive for paper
documents and magnetic media, or a project to digitise the data. (But
please correct me if I'm wrong, Geoffrey.)

If anyone out there is in a position to consider undertaking a project
to digitise it, then I understand that Professor Sampson already has a
detailed workplan. To make life even easier, the AHDS would be very
happy to offer a free service to archive, catalogue, preserve and
distribute the electronic data, on a non-exclusive basis. We could also
give advice on digitisation, if needed.

Best wishes,

Martin Wynne
Head of the Oxford Text Archive and
AHDS Literature, Languages and Linguistics

Oxford University Computing Services
13 Banbury Road
UK - OX2 6NN
Tel: +44 1865 283299
Fax: +44 1865 273275
martin.wynne at oucs.ox.ac.uk

Geoffrey Sampson wrote:

> Dear Colleagues,


> I am looking for someone who would be interested in taking over

> responsibility for a valuable research resource I have been in charge of

> in recent years.


> During the 1960s, a team of linguists sponsored by the Nuffield

> Foundation assembled a collection of the spontaneous spoken and written

> English of children and young people aged between 8+ and 15+ attending a

> variety of schools of diverse types in different urban and rural English

> regions: the "Child Language Survey". (This was initially intended as

> part of a multinational effort directed at improving foreign-language

> teaching in Europe, but I understand that parallel efforts in other

> countries fell through; the material has essentially been gathering dust

> more or less ever since it was compiled.) The leading member of the

> team was Richard Handscombe, now long since retired from a Canadian

> university and in indifferent health. After I used a small portion of

> the Survey for my LUCY treebank (www.grsampson.net/RLucy.html), Richard

> generously suggested that I should take charge of the entire Survey

> material, and arranged for it to be transported to my workplace in

> Sussex, where it now is.


> Since then, I have made repeated attempts to get funding to computerize

> this material, clearly a necessary first step to unlocking the research

> potential it contains. Although referees' reports on my various grant

> applications have been outstandingly positive, unfortunately no

> application has finally succeeded. I now find myself too close to

> retirement for a further application to be worth making; even if I

> secured funding now, I would not have time to see the work through to

> completion. Hence I would be interested in hearing from anyone younger

> who might succeed where I have failed.


> In my view the collection has unparalleled potential scientific value.

> In the first place, it creates a possibility (which otherwise scarcely

> exists) of comparing spontaneous English usage across several decades of

> time -- children of the 1960s with children now, and/or the usage of a

> generation in childhood with the usage of the same generation now it is

> middle-aged. One can envisage many significant applications to the

> study of language-skills education, for instance. One anonymous grant

> referee in 2005 commented:


> "there is a yawning gap where there should be a research literature

> on grammatical development at school age (contrasting with a rich supply

> of research on both pre-school children and adults). What is needed

> more than anything else is precisely what this project offers: age-

> related data on speech and writing from the same children ..."


> The written portion of the material represents children's spontaneous

> writing abilities in a way which in my experience is hard to match even

> for present-day children. Collections of child writing often turn out to

> be heavily influenced by the adult prose they have consulted, but the

> Child Language Survey compilers found clever ways to get at what the

> children could do under their own steam. And the quality of the

> collection is extremely high. The spoken material has been transcribed

> with an accuracy that compares very favourably with the speech

> transcriptions in the British National Corpus (and I have the original

> tape-recordings as well as the transcriptions). The written material

> has been converted from the children's handwriting into typescript with

> astonishing care, so that for instance every crossed-out letter is

> identified. As a very rough estimate, the whole might comprise about

> 800,000 words of speech and about 200,000 words of writing.


> It will be a minor scientific tragedy, to my mind, if this material is

> lost to scholarship. Yet, if I cannot find a suitable home for it

> fairly soon, that fate looks unavoidable.


> Accordingly, I should be very happy to hear from anyone who feels able

> to rescue the Child Language Survey from oblivion. After handing it

> over, I would be willing, indeed eager, to retain an involvement, to the

> extent of advising on what I know about it, etc., but decisions would be

> for the new owner to make: I have no wish to be a back-seat driver. I

> would be quite willing to transfer the collection out of Britain -- I

> have the impression that scholarly values may be in a better state in

> some Continental European countries, for instance, than they are in

> British universities nowadays. (And I would be glad to supply

> documentation on my grant applications, referee reports, etc., if they

> would help someone else construct a case for support.)


> Anyone who would like to be considered is invited to contact me,

> commenting briefly on how he or she would hope to publish and/or exploit

> the material, and we can take it from there.


> Geoffrey Sampson



> ............................................................

> Prof. Geoffrey Sampson MA PhD MBCS CITP ILTM


> author of "The 'Language Instinct' Debate"


> Department of Informatics, University of Sussex

> Falmer, Brighton BN1 9QH, England


> www.grsampson.net +44 1273 678525

> ............................................................




More information about the Corpora-archive mailing list