[Corpora-List] Corpus Linguistics in the South 4: Hands-on workshop

Charlotte Taylor Charlotte.Taylor at port.ac.uk
Wed Aug 29 11:20:30 CEST 2012


We are pleased to announce that the next Corpus Linguistics in the South will be hosted by the University of Portsmouth on Saturday 10 November. It will be a practical hands-on workshop with software which may be useful to corpus linguists. The programme and description of the sessions are copied below.

As always, attendance is free but places are limited and will be assigned on a first come first served basis. If you would like to attend, please email charlotte.taylor at port.ac.uk. Could you also specify if you would like to join us for lunch at a local cafe/restaurant (max 10).

Programme 9.15 Welcome coffee 9.30 Sketch Engine: Advanced workshop

Adam Kilgarriff, Lexcom Computing, Brighton 11.00 EXMARaLDA (Extensible Markup Language for Discourse Annotation) Daniel Jettka, Hamburg Centre for Spoken Corpora, Germany 13.00 Lunch 14.15 CHILDES (Child Language Data Exchange System) Kevin McManus, University of Southampton 15.45 Unix for Corpus Users

John Williams, University of Portsmouth 17.15 Arrangement of next two Corpus Linguistics in the South events & Close

Sketch Engine: Advanced Workshop This will be an opportunity for people with some experience of Sketch Engine to see and try out some more advanced features, and also to ask any questions, particular of the 'How do I do X?' variety. As with most software, most users are only aware of a small fraction of what the software offers, and find it rewarding to have their repertoire extended. My usual experience with workshops of this kind is that there are many instances of wide-eyed looks which say "Ah, so THAT is how you do that!" Come prepared with any queries or reports you want to be able to do, but are not sure how, and we'll work out how together in the workshop.

Introduction to EXMARaLDA The workshop will introduce EXMARaLDA ("Extensible Markup Language for Discourse Annotation"), a system of concepts, data formats, and tools for the computer assisted transcription and annotation of spoken language, and for the construction and analysis of spoken language corpora. During the workshop three related tools will be introduced: (1) the Partitur Editor - a tool for inputting, editing, and outputting transcriptions in partitur (musical score) notation, (2) the Corpus Manager (CoMa) which is designed to merge transcripts created with the Partitur Editor with their corresponding recordings into corpora and to enrich them with metadata, and (3) the query tool EXAKT ("EXMARaLDA Analysis and Concordancing Tool") for searching transcribed and annotated phenomena in an EXMARaLDA corpus. After a brief introduction, the participants will have the chance to gain some practical experience with the tools. The focus will presumably be on the transcription and annotation of audio and/or video data in the Partitur Editor so please feel free to bring along your own data for testing. To find out more about EXMARaLDA visit http://www.exmaralda.org/en_index.html

Introduction to CHILDES The overall purpose of the session is to provide practical, hands-on experience of the CHILDES database and its tools for researchers working in any field of language acquisition. In particular, we aim: a) to introduce researchers unfamiliar with CHILDES, but planning to do empirical work, to the basics of transcription and coding of new and existing material and to the tools available to analyse data; b) to help researchers in addressing specific research questions within CHILDES (e.g. use of part-of-speech tagger, searches on morphosyntactic lines, etc).

Introduction to Unix for Corpus Users This workshop is intended for corpus users with little or no knowledge of the Unix command line who would like to extend their repertoire of searching, sorting, and synthesizing techniques beyond those that are available through the standard corpus-query software packages (SketchEngine, AntConc, Wordsmith, etc). The workshop will be divided

into three phases: a) Some baoptions, input & output, pipes, file management, aliases, .rc files b) The most useful Unix commands for corpus linguists: cat, grep, sed, sort, uniq (We will chain some of these together to create a customized word list with frequencies) . Some of these commands are integrated into the standard packages but by using them at the command line their range and flexibility can be greatly extended. This part of the workshop will also include a discussion of regular expressions. c) It is hoped to be able to demonstrate a simple Unix shellscript (program) which will convert batches of .doc and .pdf files to .txt , to aid participants in building their own corpora. This tool will be available to take away (or to be sent by email) at the end of the workshop.

-------------------------------------------------- Year 1 Tutor, SLAS Senior Lecturer in English Language and Linguistics

School of Languages and Area Studies University of Portsmouth Park Building King Henry I Street Portsmouth PO1 2DZ

Room 4.31, Tel. 023 92 846161 http://www.port.ac.uk/departments/academic/slas/staff/title,103868,en.html http://port.academia.edu/CharlotteTaylor

-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 19088 bytes Desc: HTML URL: <https://mailman.uib.no/public/corpora/attachments/20120829/e20a49fe/attachment.txt>



More information about the Corpora mailing list