[Corpora-List] Canadian English corpora

Filip Miletic filip.miletic at univ-tlse2.fr
Tue Jan 15 13:49:30 CET 2019

Dear all,

I am working on a research project which aims to examine semasiological variation across Canadian English dialect regions using distributional semantic analysis. I am trying to assemble a corpus which is sufficiently large for distributional analysis, but which ideally also provides some sociolinguistic information.

I was wondering if anyone is aware of any CanE corpora in addition to those in the list below. I would also appreciate any information on the corpora that are included in the list, but are not publicly available.

- Strathy (https://corpus.byu.edu/can/): 50m words, ‘balanced’ genres

- iWeb (https://corpus.byu.edu/iweb/): 308m words, most frequent websites

- NOW (https://corpus.byu.edu/now/): 957m words, online newspapers

- GloWbE (https://corpus.byu.edu/glowbe/): 134m words, websites and blogs

- CORE (https://corpus.byu.edu/core/): 3m? words, websites

- ICE-Canada (http://ice-corpora.net/ice/): 1m words, variety of genres;

download not working

- SCVE (http://web.uvic.ca/~adarcy/SLRL.htm): sociolinguistic

interviews, 162 speakers; availability?

- TEA (http://individual.utoronto.ca/tagliamonte/lab.html): 1.5m words,

sociolinguistic interviews, 199 speakers; availability?

(For corpora covering multiple countries, token counts are my best estimates for the Canadian section of the corpus. SCVE = Synchronic Corpus of Victoria English; TEA = Toronto English Archive.)

Thank you in advance!

Best regards, Filip Miletic

Doctorant contractuel Laboratoire CLLE-ERSS – UMR 5263 CNRS Université Toulouse - Jean Jaurès (France) https://clle.univ-tlse2.fr/accueil/annuaire/filip-miletic--568161.kjsp -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 3243 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20190115/97d82726/attachment.txt>

More information about the Corpora mailing list