[Corpora-List] Canadian English corpora

Filip Miletic filip.miletic at univ-tlse2.fr
Fri Jan 18 11:41:15 CET 2019


Dear Vlado,

Thank you for your reply, much appreciated!

Best, Filip Miletic

Le mar. 15 janv. 2019 ŕ 14:15, Vladimír Benko <vladimir.benko at juls.savba.sk> a écrit :


> Dear Filip,
>
> The Canadian part of our Araneum Anglicum Maximum web Corpus contains
> approx. 210 M tokens. It is accessible online under NoSketch Engine corpus
> manager (see the link below). The Canadian subcorpus has been defined by a
> rather rudimentary way, i.e., by means of the .ca TLD. A more
> sophisticated geo-location for our web corpora is an item in our wishlist
> for the near future (and not only for the Canadian data ;-)
>
> Best,
>
> Vlado B, 14:10
>
> --
> Vladimír Benko
>
> Slovak Academy of Sciences
> Ľ. Štúr Institute of Linguistics
> Panská 26, SK-81101 Bratislava
>
> Tel +421-2-54431762 Fax -54431756
>
> http://aranea.juls.savba.sk/guest/
> https://www.facebook.com/araneawebcorpora/
>
>
>
> Dear all,
>
> I am working on a research project which aims to examine semasiological
> variation across Canadian English dialect regions using distributional
> semantic analysis. I am trying to assemble a corpus which is sufficiently
> large for distributional analysis, but which ideally also provides some
> sociolinguistic information.
>
> I was wondering if anyone is aware of any CanE corpora in addition to
> those in the list below. I would also appreciate any information on the
> corpora that are included in the list, but are not publicly available.
>
> - Strathy (https://corpus.byu.edu/can/): 50m words, ‘balanced’ genres
> - iWeb (https://corpus.byu.edu/iweb/): 308m words, most frequent
> websites
> - NOW (https://corpus.byu.edu/now/): 957m words, online newspapers
> - GloWbE (https://corpus.byu.edu/glowbe/): 134m words, websites and
> blogs
> - CORE (https://corpus.byu.edu/core/): 3m? words, websites
> - ICE-Canada (http://ice-corpora.net/ice/): 1m words, variety of
> genres; download not working
> - SCVE (http://web.uvic.ca/~adarcy/SLRL.htm): sociolinguistic
> interviews, 162 speakers; availability?
> - TEA (http://individual.utoronto.ca/tagliamonte/lab.html): 1.5m
> words, sociolinguistic interviews, 199 speakers; availability?
>
> (For corpora covering multiple countries, token counts are my best
> estimates for the Canadian section of the corpus. SCVE = Synchronic Corpus
> of Victoria English; TEA = Toronto English Archive.)
>
> Thank you in advance!
>
> Best regards,
> Filip Miletic
>
> Doctorant contractuel
> Laboratoire CLLE-ERSS – UMR 5263 CNRS
> Université Toulouse - Jean Jaurčs (France)
> https://clle.univ-tlse2.fr/accueil/annuaire/filip-miletic--568161.kjsp
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing listCorpora at uib.nohttps://mailman.uib.no/listinfo/corpora
>
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> https://mailman.uib.no/listinfo/corpora
>
-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 8401 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20190118/f82a9de0/attachment.txt>



More information about the Corpora mailing list