[Corpora-List] Written BNC2014 data collection

Gillings, Mathew m.gillings at lancaster.ac.uk
Wed Oct 17 14:03:27 CEST 2018

Hi all,

We are currently in the process of compiling the Written British National Corpus 2014, a project led by Lancaster University to create a successor to the 100-million word BNC. The corpus will allow for diachronic comparisons with the original BNC (BNC1994), whilst being representative of current British English. We are collecting samples from fiction, academic journals, newspapers, magazines, blogs and more.

To make the corpus as relevant to present-day language as possible, we are including just under 10-million words of e-language. This includes easy-to-get things like online blogs and reviews, but as you might expect, other forms of e-language are slightly more difficult to collect. Namely, we are on the hunt for more email communication, and more text messages / instant messages.

This type of language is notoriously difficult to collect due to personal information often being included in the data, and people outside of the community may have concerns about how their language is being used within a "corpus". It is for that reason that we are reaching out to the corpus linguistics community. If you are an L1 speaker of British English, and if you would be willing to contribute your email communication (either personal or professional) or your instant messages, please visit our project website<http://cass.lancs.ac.uk/bnc2014/> and follow the instructions on how to submit. We are especially looking for online chat logs.

If it's easier, you can simply forward your email communication to writtenbnc2014 at gmail.com<mailto:writtenbnc2014 at gmail.com>, and (assuming it is British English) we will include it.

Just get in touch if you have any questions and we'd be happy to advise.


The Written BNC2014 team

Mathew Gillings, PhD Student Associate Lecturer & Assistant Dean, The County College

ESRC Centre for Corpus Approaches to Social Science (CASS)

Department of Linguistics and English Language

B09, FASS Building, Lancaster University, LA1 4YW

+44(0) 1524 593653

Twitter<https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftwitter.com%2Fmathewgillings&data=02%7C01%7C%7C23a56fa32d714734f2be08d6341ddb5a%7C9c9bcd11977a4e9ca9a0bc734090164a%7C1%7C0%7C636753700159040965&sdata=JHjQxKxZoiPpKYxq0j5xlZGvByN0Rvflj4JGq9kQrnw%3D&reserved=0> | LinkedIn<https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.linkedin.com%2Fin%2Fmathewgillings%2F&data=02%7C01%7C%7C23a56fa32d714734f2be08d6341ddb5a%7C9c9bcd11977a4e9ca9a0bc734090164a%7C1%7C0%7C636753700159050974&sdata=wBtagO4K4OSbHTs0LcpvBSdz%2BMfO0Sv2nVWgSx0I914%3D&reserved=0> | CASS<http://cass.lancs.ac.uk/> | Shakespeare<http://wp.lancs.ac.uk/shakespearelang/> | Deanery<https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.thecountycollege.com%2Fwelfare-deanery&data=02%7C01%7C%7C23a56fa32d714734f2be08d6341ddb5a%7C9c9bcd11977a4e9ca9a0bc734090164a%7C1%7C0%7C636753700159060988&sdata=ryW0kf27On5DYVpuJZQJILJ6CfUVJWY%2F7p1DZU%2F3Q2w%3D&reserved=0>

***As part of an ESRC-funded research project, we are collecting SMS/WhatsApp/Facebook messages and email communication from native speakers of English for inclusion in the British National Corpus 2014, a new 100-million word dataset. If you are interested in contributing, either get in touch with me, forward your sent emails, or CC writtenbnc2014 at gmail.com<mailto:writtenbnc2014 at gmail.com> in your reply!***

-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 7435 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20181017/79e40a45/attachment.txt>

More information about the Corpora mailing list