[Corpora-List] Jena Organization Corpus (JOCo) Release

Sven Buechel sven.buechel at uni-jena.de
Mon Jan 14 13:37:08 CET 2019


We are pleased to announce the release of JOCo, the Jena Organization Corpus. JOCo comprises annual reports (ARs) and corporate social responsibility reports of US American, British and German business organizations, i.e. corporations, which are listed in the main indices such as DOW JONES, S&P 500, and NASDAQ 100 for the USA; FTSE, FTSE AIM 100, FTSE 250 for Great Britain; DAX, MDAX, and TecDAX for Germany. All reports are written in English.

JOCo contains over 5000 plain text reports by 270 companies from the year 2000 to 2016, adding up to over 280 Million tokens. The plain text files were automatically derived from PDF documents including several steps post-processing to repair conversion errors.

The creation of JOCo followed a careful sampling strategy, i.e., balancing different countries of origins, stock indices and industry branches. JOCo thus allows for a wide variety of analyses and enables meaningful comparisons across different variables.

JOCo is available for non-commercial research activities upon submission of data use agreement form. Please refer to https://www.orga.uni-jena.de/corpusor our paper

Sebastian G.M. Händschke, Sven Buechel, Jan Goldenstein, Philipp Poschmann, Tinghui Duan, Peter Walgenbach, and Udo Hahn. 2018. A Corpus of Corporate Annual and Social Responsibility Reports: 280 Million Tokens of Balanced Organizational Writing. In ECONLP 2018 – Proceedings of the First Workshop on Economics and Natural Language Processing @ ACL 2018. Melbourne, Australia, July 20, 2018. Pages 2031. Available: http://aclweb.org/anthology/W18-3103

for additional information.

Kind Regards

Sven Buechel

on behalf of the Jena University Language and Information Engineering (JULIE) Lab (Prof. Udo Hahn) and the Chair of Organization, Leadership and Human Resource Management (Prof. Peter Walgenbach), FSU Jena, Germany

-- Sven Buechel Doctoral Researcher Jena University Language and Information Engineering (JULIE) Lab Friedrich-Schiller-Universität Jena Fürstengraben 27, 07743 Jena, Germany https://julielab.de/Staff/Buechel/ -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 7078 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20190114/41057546/attachment.txt>



More information about the Corpora mailing list