We collected around 366k labelled patents from the European Patent Organization. The around 700 labels are organized by an ontology with a label description, a subset of 70k labels. The patents are in the following languages: English (75%), German (20%) and French (5%). The goal is to classify each patent to multiple labels of the ontology (hierarchical multi-label classification) and the task is divided into two subtasks to evaluate a good MHMC system but also to cope with zero-/few-shot scenario, which often appears in datasets with large label set.
Subtask A: Classify the patent as in a standard multi-lingual hierarchical multi-label document classification setup with a large amount of patents.
Subtask B: In this subtask, a zero-shot/few-shot approach is needed since some labels in the test set have very few or even zero training samples. We provide here the ontology with the descriptions of the classes.
The evaluation measure is the harmonic mean between micro and macro F-1 score. The test samples belonging to Subtask B will be not considered for the score measurement of Subtask A, i.e. Subtask A is a subsample of Subtask B.
15 January 2020: Start of Competition
17 March 2020: Start of Test Phase
24 March 2020: End of Competition
31 March 2020: Results Announcement
14 April 2020: System Description Submission
28 April 2020: Notification of Acceptance
5 May 2020: Camera-Ready Submission
23-25 June 2020: Presentation of Results at SwissText & KONVENS Joint
For further information and updates, please check:
Dr. Fernando Benites (Zurich University of Applied Sciences (ZHAW), Switzerland)
Dr. Ahmad Aghaebrahimian (Zurich University of Applied Sciences (ZHAW), Switzerland)
Steffen Remus (University of Hamburg, Germany)
Prof. Dr. Mark Cieliebak (Zurich University of Applied Sciences (ZHAW), Switzerland) -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 11549 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20200107/109c607a/attachment.txt>