[Corpora-List] WebIsALOD - Large-scale Hypernymy Dataset Released

Koos Wilt kooswilt at gmail.com
Thu May 18 12:01:47 CEST 2017

Heiko and others,

I hope my response is appropriate. I am conducting a series of experiments to show 'linguistics improves Text Analytics'. (philosophical underpinnings: low-hanging fruit in comparison to tedious Neural Net Studies; linguistics and statistics are complements, as Ken Church asks us to consider in A PENDULUM SWUNG TOO FAR.) Remaining on-topic, one of my experiments concerns hypernyms. In an ensemble with POS tagging, and regular SVO Triples, akin and applicable to Semantic Web stuff, SUBJECT PREDICATE OBJECT triples expanded with hypernyms brings correct classification from 343 out of 400 (baseline) to 377/400, quite an improvement.

My point is studying hypernyms and semantic in general is well worth it. And timely: I claim: all the parsers makes for us having conquered syntax.

I do not have the code for all this ready to present, but here's a taste of what I already have but not uploaded to GitHub yet: the code showing the improvement of just plain SUBJECT PREDICATE OBJECT triples (343/400 --> 363/400). Disclaimer: code not reviewed, written hurriedly for proof-of-concept.


Best regards,


2017-05-18 10:25 GMT+02:00 Heiko Paulheim <heiko at informatik.uni-mannheim.de>

> Dear all,
> the Data and Web Science group at University of Mannheim is happy to
> announce the first release of the WebIsA database [1] as a Linked Open Data
> endpoint. The dataset contains 11.7 million hypernym or subsumption
> relations ("is a") collected from the Web (e.g., "iPhone 4 is a
> smartphone"), using a set of Hearst-like patterns (see the paper [2] for
> details). We provide the data together with confidence scores, rich
> provenance information, as well as interlinks to DBpedia and YAGO. All in
> all, the dataset contains more than 470M triples.
> The dataset is available at [3] as a Linked Data endpoint, a SPARQL
> endpoint, and downloadable dumps.
> All the best,
> Sven Hertling
> Heiko Paulheim
> [1] http://webdatacommons.org/isadb
> [2] Julian Seitner, Christian Bizer, Kai Eckert, Stefano Faralli, Robert
> Meusel, Heiko Paulheim and Simone Paolo Ponzetto: A Large Database of
> Hypernymy Relations Extracted from the Web. In: LREC 2016.
> [3] http://webisa.webdatacommons.org/
> --
> Prof. Dr. Heiko Paulheim
> Data and Web Science Group
> University of Mannheim
> Phone: +49 621 181 2652
> B6, 26, Room B1.16
> D-68159 Mannheim
> Mail: heiko at informatik.uni-mannheim.de
> Web: www.heikopaulheim.com
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 4075 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20170518/80a0f821/attachment.txt>

More information about the Corpora mailing list