[Corpora-List] Danish Gigaword v1.0 release

Leon Derczynski leonderczynski at gmail.com
Thu Jun 3 07:32:23 CEST 2021

This week marks the release of Danish Gigaword v1.0, with over 1,000,000,000 words of Danish, spanning centuries, dialects, registers, modalities, and domains. This marks the largest single collection of openly-licensed documents in Danish, and we hope helps bring the language up from an underprivileged to a well-resourced one.

Links: * The DAGW homepage, https://gigaword.dk/ , where there's a download link and license information; * The paper in the ACL anthology, https://www.aclweb.org/anthology/2021.nodalida-main.46/

Thank you for your interest.


Leon Derczynski (IT University of Copenhagen) Manuel R. Ciosici (University of Southern California / IT University of Copenhagen) -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 890 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20210603/9364116b/attachment.txt>

More information about the Corpora mailing list