[Corpora-List] CfP: From data to evidence in English language research: Big data, rich data, uncharted data

Tanja Säily tanja.saily at helsinki.fi
Mon Oct 6 15:38:55 CEST 2014


From data to evidence in English language research: Big data, rich data, uncharted data

***Conference in Helsinki, Finland, 19-22 October 2015***

To diversify the discussion of data explosion in the humanities, the Research Unit for Variation, Contacts and Change in English (VARIENG) is organising an academic conference that addresses the use of new data sources, historical and modern, in English language research. We are particularly interested in papers discussing the advantages and disadvantages of the following three kinds of data:

Big data

In recent years, mega-corpora and other large text collections have become increasingly available to linguists. These databases open new opportunities for linguistic research, but they may be problematic in terms of representativeness and contextualisation, and the sheer amount of data may also pose practical problems. We welcome papers drawing on big data, including large corpora representing different genres and varieties (e.g. COCA, GloWbE), databases (e.g. EEBO, ECCO) and corpora created by web crawling (e.g. EnTenTen, UKWaC).

Rich data

Rich data contains more than just the texts, including representations of spacing, graphical elements, choice of typeface, prosody, or gestures. This is further supplemented by analytic and descriptive metadata linked to either entire texts or individual textual elements. The benefit of rich data is that it can provide new kinds of evidence about pragmatic, sociolinguistic and even syntactic aspects of linguistic events. Yet the creation and use of rich data bring great challenges. We invite papers on the representation, query, analysis, and visualisation of data consisting of more than linear text.

Uncharted data

Uncharted data comprises material which has not yet been systematically mapped, surveyed or investigated. We wish to draw attention to texts and language varieties which are marginally represented in current corpora, to data sources that exist on the internet or in manuscript form alone, and material compiled for purposes other than linguistic research. We welcome papers discussing the innovative research prospects offered by new and and previously unused or even unidentified material for the study of English in various contexts ranging from communities and networks to social groups and individuals.

Abstracts are invited by 15 February 2015 for 30-minute presentations including discussion as well as for posters and corpus and software demonstrations.

The following invited speakers have confirmed their participation:

Professor Mark Davies (Brigham Young University) Professor Tony McEnery (Lancaster University) Professor Päivi Pahta (University of Tampere) Dr Jane Winters (Institute of Historical Research, University of London)

The conference forms part of the programme celebrating the 375th anniversary of the University of Helsinki in 2015 and will be held in the Main Building of the University.

More information on the conference will be available on the conference home page at: http://www.helsinki.fi/varieng/d2e/. Please address any queries to: d2e-conference at helsinki.fi.

-- Tanja Säily MA, Postgraduate Student Research Unit for Variation, Contacts and Change in English (VARIENG) http://www.helsinki.fi/varieng/people/varieng_saily.html



More information about the Corpora mailing list