GermEval 2019 Task 1 - Shared Task on hierarchical classification of German Blurbs
First Call for Participation
This is the call to participate in the shared task on hierarchical classification on blurbs at the GermEval 2019. We invite everyone interested to participate in this shared task. The shared task can be found under this webpage: https://competitions.codalab.org/competitions/21226.
Hierarchical multi-label classification (HMC) of blurbs is the task of classifying multiple labels for a short descriptive text, where each label is part of an underlying hierarchy of categories. The increasing amount of available digital documents and the need for more and finer-grained categories calls for new, more robust and sophisticated text classification methods. Large datasets often incorporate a hierarchy for, which can be used to categorize information of documents on different levels of specificity. The traditional multi-class text classification approaches are thoroughly researched, however, with the increase of available data and the necessity of more specific hierarchies and since traditional approaches fail to generalize adequately, the need for more robust and sophisticated classification methods increases.
With this task we aim to foster research within this context. This task is focusing on classifying German books into their respective hierarchically structured writing genres using short advertisement texts (blurbs) and further meta information such as author, page number, release date, etc.
This shared task consists of two subtask, described below. You can participate in one of them, or in both.
Subtask A: The task is to classify German books into one or multiple most general writing genres. Therfore, it can be considered a multi-label classification task. In total, there are 8 classes that can be assigned to a book: Literatur & Unterhaltung, Ratgeber, Kinderbuch & Jugendbuch, Sachbuch, Ganzheitliches Bewusstsein, Glaube & Ethik, Künste, Architektur & Garten.
SubTask B: The second task targets hierarchical multi-label classification into multiple writing genres. In addition to the very general writing genres, additional genres of different specificity can be assigned to a book. In total, there are 343 different classes that are hierarchically structured on up to 4 levels.
The dataset for this task consists in total of 20,784 examples. Sample data is already available to get familiar with the data structure of this task. Training data is going to be released in February and can be downloaded after registering for the shared tasks. The evaluation of the task will take place in July 2019. More information can be found on the GermEval-2019 website at: https://competitions.codalab.org/competitions/21226
Important Dates * Jan 2019: Release of trial data * Feb 01, 2019: Release of training data (train + validation) * Jun 01, 2019: Release test data * July 15, 2019: Final submission of test results * July 30, 2019: Submission of description paper * Aug, 2019: Workshop in Nürnberg/Erlangen, Germany at the Conference on Natural Language Processing KONVENS 2019 (https://dgfs.de/de/cl/konvens.html)
Organizers The task is organized by Rami Aly, Steffen Remus and Chris Biemann from Language Technology, Department of Informatics, Universität Hamburg. https://www.inf.uni-hamburg.de/en/inst/ab/lt/home.html
GermEval GermEval is a series of shared task evaluation campaigns that focus on Natural Language Processing for the German language. GermEval has been conducted four times since 2014 in co-location with KONVENS/GSCL conferences.