JOCo contains over 5000 plain text reports by 270 companies from the year 2000 to 2016, adding up to over 280 Million tokens. The plain text files were automatically derived from PDF documents including several steps post-processing to repair conversion errors.
The creation of JOCo followed a careful sampling strategy, i.e., balancing different countries of origins, stock indices and industry branches. JOCo thus allows for a wide variety of analyses and enables meaningful comparisons across different variables.
JOCo is available for non-commercial research activities upon submission of data use agreement form. Please refer to https://www.orga.uni-jena.de/corpusor our paper
Sebastian G.M. Händschke, Sven Buechel, Jan Goldenstein, Philipp Poschmann, Tinghui Duan, Peter Walgenbach, and Udo Hahn. 2018. A Corpus of Corporate Annual and Social Responsibility Reports: 280 Million Tokens of Balanced Organizational Writing. In ECONLP 2018 – Proceedings of the First Workshop on Economics and Natural Language Processing @ ACL 2018. Melbourne, Australia, July 20, 2018. Pages 2031. Available: http://aclweb.org/anthology/W18-3103
for additional information.
on behalf of the Jena University Language and Information Engineering (JULIE) Lab (Prof. Udo Hahn) and the Chair of Organization, Leadership and Human Resource Management (Prof. Peter Walgenbach), FSU Jena, Germany
-- Sven Buechel Doctoral Researcher Jena University Language and Information Engineering (JULIE) Lab Friedrich-Schiller-Universität Jena Fürstengraben 27, 07743 Jena, Germany https://julielab.de/Staff/Buechel/ -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 7078 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20190114/41057546/attachment.txt>