[Corpora-List] Canadian English corpora

Filip Miletic filip.miletic at univ-tlse2.fr
Thu Jan 24 11:06:24 CET 2019


Thank you Ondřej!

Best, Filip Miletic

Le lun. 21 janv. 2019 à 12:05, Ondřej Matuška < ondrej.matuska at sketchengine.co.uk> a écrit :


> There are 1 billion tokens from Canada in the Timestamped JSI corpus which
> can be accessed in Sketch Engine.
>
> https://app.sketchengine.eu/#dashboard?corpname=preloaded%2Feng_jsi_newsfeed_virt
>
> The corpus is generated from the JSI Newsfeed http://newsfeed.ijs.si/
>
> Ondřej
>
>
> *Ondřej Matuška*
> Brighton, UK | Brno, CZ
> sketchengine.co.uk <http://www.sketchengine.co.uk> | Facebook
> <https://www.facebook.com/SketchEngine/> | LinkedIn
> <https://www.linkedin.com/in/ondrejmatuska> | Twitter
> <https://twitter.com/SketchEngine>
>
>
>
> Become a Sketch Engine *expert *in two days! Attend the Boot Camp »
> <https://www.sketchengine.eu/bootcamp/>
>
>
>
> On Tue, 15 Jan 2019 at 13:55, Filip Miletic <filip.miletic at univ-tlse2.fr>
> wrote:
>
>> Dear all,
>>
>> I am working on a research project which aims to examine semasiological
>> variation across Canadian English dialect regions using distributional
>> semantic analysis. I am trying to assemble a corpus which is sufficiently
>> large for distributional analysis, but which ideally also provides some
>> sociolinguistic information.
>>
>> I was wondering if anyone is aware of any CanE corpora in addition to
>> those in the list below. I would also appreciate any information on the
>> corpora that are included in the list, but are not publicly available.
>>
>> - Strathy (https://corpus.byu.edu/can/): 50m words, ‘balanced’ genres
>> - iWeb (https://corpus.byu.edu/iweb/): 308m words, most frequent
>> websites
>> - NOW (https://corpus.byu.edu/now/): 957m words, online newspapers
>> - GloWbE (https://corpus.byu.edu/glowbe/): 134m words, websites and
>> blogs
>> - CORE (https://corpus.byu.edu/core/): 3m? words, websites
>> - ICE-Canada (http://ice-corpora.net/ice/): 1m words, variety of
>> genres; download not working
>> - SCVE (http://web.uvic.ca/~adarcy/SLRL.htm): sociolinguistic
>> interviews, 162 speakers; availability?
>> - TEA (http://individual.utoronto.ca/tagliamonte/lab.html): 1.5m
>> words, sociolinguistic interviews, 199 speakers; availability?
>>
>> (For corpora covering multiple countries, token counts are my best
>> estimates for the Canadian section of the corpus. SCVE = Synchronic Corpus
>> of Victoria English; TEA = Toronto English Archive.)
>>
>> Thank you in advance!
>>
>> Best regards,
>> Filip Miletic
>>
>> Doctorant contractuel
>> Laboratoire CLLE-ERSS – UMR 5263 CNRS
>> Université Toulouse - Jean Jaurès (France)
>> https://clle.univ-tlse2.fr/accueil/annuaire/filip-miletic--568161.kjsp
>> _______________________________________________
>> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
>> Corpora mailing list
>> Corpora at uib.no
>> https://mailman.uib.no/listinfo/corpora
>>
>
-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 6973 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20190124/7da93b1d/attachment.txt>



More information about the Corpora mailing list