[Corpora-List] Canadian English corpora

Felix Bildhauer felix.bildhauer at fu-berlin.de
Fri Jan 18 12:07:44 CET 2019


Hi Filip,

the ENCOW16 corpus (https://corporafromtheweb.org/encow16/) contains a "Canada" subcorpus (112 M tokens), defined by top-level domain *and* geo-location. Combining TLD and server geo-location yields good results for other languages with regional varieties (e.g., Spanish).

Best, Felix



More information about the Corpora mailing list