Best, Filip Miletic
Le lun. 21 janv. 2019 à 12:05, Ondřej Matuška < ondrej.matuska at sketchengine.co.uk> a écrit :
> There are 1 billion tokens from Canada in the Timestamped JSI corpus which
> can be accessed in Sketch Engine.
> The corpus is generated from the JSI Newsfeed http://newsfeed.ijs.si/
> *Ondřej Matuška*
> Brighton, UK | Brno, CZ
> sketchengine.co.uk <http://www.sketchengine.co.uk> | Facebook
> <https://www.facebook.com/SketchEngine/> | LinkedIn
> <https://www.linkedin.com/in/ondrejmatuska> | Twitter
> Become a Sketch Engine *expert *in two days! Attend the Boot Camp »
> On Tue, 15 Jan 2019 at 13:55, Filip Miletic <filip.miletic at univ-tlse2.fr>
>> Dear all,
>> I am working on a research project which aims to examine semasiological
>> variation across Canadian English dialect regions using distributional
>> semantic analysis. I am trying to assemble a corpus which is sufficiently
>> large for distributional analysis, but which ideally also provides some
>> sociolinguistic information.
>> I was wondering if anyone is aware of any CanE corpora in addition to
>> those in the list below. I would also appreciate any information on the
>> corpora that are included in the list, but are not publicly available.
>> - Strathy (https://corpus.byu.edu/can/): 50m words, ‘balanced’ genres
>> - iWeb (https://corpus.byu.edu/iweb/): 308m words, most frequent
>> - NOW (https://corpus.byu.edu/now/): 957m words, online newspapers
>> - GloWbE (https://corpus.byu.edu/glowbe/): 134m words, websites and
>> - CORE (https://corpus.byu.edu/core/): 3m? words, websites
>> - ICE-Canada (http://ice-corpora.net/ice/): 1m words, variety of
>> genres; download not working
>> - SCVE (http://web.uvic.ca/~adarcy/SLRL.htm): sociolinguistic
>> interviews, 162 speakers; availability?
>> - TEA (http://individual.utoronto.ca/tagliamonte/lab.html): 1.5m
>> words, sociolinguistic interviews, 199 speakers; availability?
>> (For corpora covering multiple countries, token counts are my best
>> estimates for the Canadian section of the corpus. SCVE = Synchronic Corpus
>> of Victoria English; TEA = Toronto English Archive.)
>> Thank you in advance!
>> Best regards,
>> Filip Miletic
>> Doctorant contractuel
>> Laboratoire CLLE-ERSS – UMR 5263 CNRS
>> Université Toulouse - Jean Jaurès (France)
>> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
>> Corpora mailing list
>> Corpora at uib.no
-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 6973 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20190124/7da93b1d/attachment.txt>