I am working on a research project which aims to examine semasiological variation across Canadian English dialect regions using distributional semantic analysis. I am trying to assemble a corpus which is sufficiently large for distributional analysis, but which ideally also provides some sociolinguistic information.
I was wondering if anyone is aware of any CanE corpora in addition to those in the list below. I would also appreciate any information on the corpora that are included in the list, but are not publicly available.
- Strathy (https://corpus.byu.edu/can/): 50m words, ‘balanced’ genres
- iWeb (https://corpus.byu.edu/iweb/): 308m words, most frequent websites
- NOW (https://corpus.byu.edu/now/): 957m words, online newspapers
- GloWbE (https://corpus.byu.edu/glowbe/): 134m words, websites and blogs
- CORE (https://corpus.byu.edu/core/): 3m? words, websites
- ICE-Canada (http://ice-corpora.net/ice/): 1m words, variety of genres;
download not working
- SCVE (http://web.uvic.ca/~adarcy/SLRL.htm): sociolinguistic
interviews, 162 speakers; availability?
- TEA (http://individual.utoronto.ca/tagliamonte/lab.html): 1.5m words,
sociolinguistic interviews, 199 speakers; availability?
(For corpora covering multiple countries, token counts are my best estimates for the Canadian section of the corpus. SCVE = Synchronic Corpus of Victoria English; TEA = Toronto English Archive.)
Thank you in advance!
Best regards, Filip Miletic
Doctorant contractuel Laboratoire CLLE-ERSS – UMR 5263 CNRS Université Toulouse - Jean Jaurès (France) https://clle.univ-tlse2.fr/accueil/annuaire/filip-miletic--568161.kjsp -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 3243 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20190115/97d82726/attachment.txt>