[Corpora-List] English dataset about >200K human activities and processes

Paolo Pareti P.Pareti at sms.ed.ac.uk
Tue Apr 26 18:30:57 CEST 2016

Dear Corpora Members,

Our Linked Data dataset describing over 200,000 human activities is now online: https://w3id.org/knowhow/dataset

This dataset is particularly well suited for activity recognition tasks, common sense reasoning about human activities, or to find correlations between the entities typically involved in certain procedures.

This dataset contains over 2,5M labelled entities (in English) that describe human activities and instructions. Entities (e.g. the steps/methods/requirements/outputs of a process) are clearly annotated according to the model defined by the PROHOW vocabulary (http://w3id.org/prohow). An example of how this model looks like is available at this link: http://paolopareti.uk/prohow/PROHOW_DataModel_Example.pdf

This dataset also includes the results of an existing NLP study that decomposed and disambiguated 250K inputs/outputs of the activities. These inputs/outputs have been disambiguated to DBpedia resources.

You can download the whole dataset and find more information about it on Datahub (https://datahub.io/dataset/human-activities-and-instructions).

For any queries, feel free to contact me.


Paolo Pareti School of Informatics University of Edinburgh https://w3id.org/people/paolo

-- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

More information about the Corpora mailing list