[Corpora-List] WebIsALOD - Large-scale Hypernymy Dataset Released

Koos Wilt kooswilt at gmail.com
Thu May 18 16:35:58 CEST 2017


Dr Paulheim's post nicely dovetails with the content of the following two links:

http://www.sciencedirect.com/science/article/pii/S1532046403001175

https://semrep.nlm.nih.gov/

The first link discusses the role of hypernyms, the subject of Dr Paulheim's post. We see this kind of effort has long been in the making. My code showing the functioning of the ensemble SUBJECT PREDICATE OBJECT, POS tagging, hypernyms version of SUBJECT PREDICATE OBJECT to increase correct classification form 343/400 to 377/400, is essentially a primitive-ish implementation of these two links.

The work Dr Paulheim talks about is the stuff in the first link write large, and the potential for transforming the web into a functional repository of Linked Date is tremendous, if a headache to organize and keep track of.

I will attempt to post all the software yielding the 377/400 correct classification to GitHub this evening. If you have more than a passing interest, I am sure you will know how to find it.

Best regards,

-Koos

2017-05-18 12:01 GMT+02:00 Koos Wilt <kooswilt at gmail.com>:


> Heiko and others,
>
>
> I hope my response is appropriate. I am conducting a series of
> experiments to show 'linguistics improves Text Analytics'. (philosophical
> underpinnings: low-hanging fruit in comparison to tedious Neural Net
> Studies; linguistics and statistics are complements, as Ken Church asks us
> to consider in A PENDULUM SWUNG TOO FAR.) Remaining on-topic, one of my
> experiments concerns hypernyms. In an ensemble with POS tagging, and
> regular SVO Triples, akin and applicable to Semantic Web stuff, SUBJECT
> PREDICATE OBJECT triples expanded with hypernyms brings correct
> classification from 343 out of 400 (baseline) to 377/400, quite an
> improvement.
>
> My point is studying hypernyms and semantic in general is well worth it.
> And timely: I claim: all the parsers makes for us having conquered syntax.
>
> I do not have the code for all this ready to present, but here's a taste
> of what I already have but not uploaded to GitHub yet: the code showing the
> improvement of just plain SUBJECT PREDICATE OBJECT triples (343/400 -->
> 363/400). Disclaimer: code not reviewed, written hurriedly for
> proof-of-concept.
>
> https://github.com/Koos12/Cosine_sim_Python-w.n.w.o.Parser
>
>
> Best regards,
>
>
> -Koos
>
>
>
> 2017-05-18 10:25 GMT+02:00 Heiko Paulheim <heiko at informatik.uni-
> mannheim.de>:
>
>> Dear all,
>>
>> the Data and Web Science group at University of Mannheim is happy to
>> announce the first release of the WebIsA database [1] as a Linked Open Data
>> endpoint. The dataset contains 11.7 million hypernym or subsumption
>> relations ("is a") collected from the Web (e.g., "iPhone 4 is a
>> smartphone"), using a set of Hearst-like patterns (see the paper [2] for
>> details). We provide the data together with confidence scores, rich
>> provenance information, as well as interlinks to DBpedia and YAGO. All in
>> all, the dataset contains more than 470M triples.
>>
>> The dataset is available at [3] as a Linked Data endpoint, a SPARQL
>> endpoint, and downloadable dumps.
>>
>> All the best,
>> Sven Hertling
>> Heiko Paulheim
>>
>> [1] http://webdatacommons.org/isadb
>> [2] Julian Seitner, Christian Bizer, Kai Eckert, Stefano Faralli, Robert
>> Meusel, Heiko Paulheim and Simone Paolo Ponzetto: A Large Database of
>> Hypernymy Relations Extracted from the Web. In: LREC 2016.
>> [3] http://webisa.webdatacommons.org/
>>
>>
>> --
>> Prof. Dr. Heiko Paulheim
>> Data and Web Science Group
>> University of Mannheim
>> Phone: +49 621 181 2652
>> B6, 26, Room B1.16
>> D-68159 Mannheim
>>
>> Mail: heiko at informatik.uni-mannheim.de
>> Web: www.heikopaulheim.com
>>
>>
>> _______________________________________________
>> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
>> Corpora mailing list
>> Corpora at uib.no
>> http://mailman.uib.no/listinfo/corpora
>>
>
>
-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 5964 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20170518/395a83aa/attachment.txt>



More information about the Corpora mailing list