To see how SUBJECT PREDICATE OBJECT triples, NOT (yet) in prescribed URI Semantic Web format, but convertible to same, improves classification, run classify.py and 1classify.py and compare the numbers (343/400 correct vs. 363/400 correct). @: https://github.com/Koos12/Cosine_sim_Python-w.n.w.o. Parser
This code, slightly adjusted from the original, shows how POS (linguistics) and SPO Triples (linguistics) enhanced w/ Hypernyms (linguistics/semantics) increase classification perormance.
A draft of a paper discussing similar work on 20newsgroups is at:
Hope all this made sense and comments welcome.
2017-05-18 16:35 GMT+02:00 Koos Wilt <kooswilt at gmail.com>:
> Dr Paulheim's post nicely dovetails with the content of the following two
> The first link discusses the role of hypernyms, the subject of Dr
> Paulheim's post. We see this kind of effort has long been in the making.
> My code showing the functioning of the ensemble SUBJECT PREDICATE
> OBJECT, POS tagging, hypernyms version of SUBJECT PREDICATE OBJECT to
> increase correct classification form 343/400 to 377/400, is essentially a
> primitive-ish implementation of these two links.
> The work Dr Paulheim talks about is the stuff in the first link write
> large, and the potential for transforming the web into a functional
> repository of Linked Date is tremendous, if a headache to organize and keep
> track of.
> I will attempt to post all the software yielding the 377/400 correct
> classification to GitHub this evening. If you have more than a passing
> interest, I am sure you will know how to find it.
> Best regards,
> 2017-05-18 12:01 GMT+02:00 Koos Wilt <kooswilt at gmail.com>:
>> Heiko and others,
>> I hope my response is appropriate. I am conducting a series of
>> experiments to show 'linguistics improves Text Analytics'. (philosophical
>> underpinnings: low-hanging fruit in comparison to tedious Neural Net
>> Studies; linguistics and statistics are complements, as Ken Church asks us
>> to consider in A PENDULUM SWUNG TOO FAR.) Remaining on-topic, one of my
>> experiments concerns hypernyms. In an ensemble with POS tagging, and
>> regular SVO Triples, akin and applicable to Semantic Web stuff, SUBJECT
>> PREDICATE OBJECT triples expanded with hypernyms brings correct
>> classification from 343 out of 400 (baseline) to 377/400, quite an
>> My point is studying hypernyms and semantic in general is well worth it.
>> And timely: I claim: all the parsers makes for us having conquered syntax.
>> I do not have the code for all this ready to present, but here's a taste
>> of what I already have but not uploaded to GitHub yet: the code showing the
>> improvement of just plain SUBJECT PREDICATE OBJECT triples (343/400 -->
>> 363/400). Disclaimer: code not reviewed, written hurriedly for
>> Best regards,
>> 2017-05-18 10:25 GMT+02:00 Heiko Paulheim <heiko at informatik.uni-mannheim
>>> Dear all,
>>> the Data and Web Science group at University of Mannheim is happy to
>>> announce the first release of the WebIsA database  as a Linked Open Data
>>> endpoint. The dataset contains 11.7 million hypernym or subsumption
>>> relations ("is a") collected from the Web (e.g., "iPhone 4 is a
>>> smartphone"), using a set of Hearst-like patterns (see the paper  for
>>> details). We provide the data together with confidence scores, rich
>>> provenance information, as well as interlinks to DBpedia and YAGO. All in
>>> all, the dataset contains more than 470M triples.
>>> The dataset is available at  as a Linked Data endpoint, a SPARQL
>>> endpoint, and downloadable dumps.
>>> All the best,
>>> Sven Hertling
>>> Heiko Paulheim
>>>  http://webdatacommons.org/isadb
>>>  Julian Seitner, Christian Bizer, Kai Eckert, Stefano Faralli, Robert
>>> Meusel, Heiko Paulheim and Simone Paolo Ponzetto: A Large Database of
>>> Hypernymy Relations Extracted from the Web. In: LREC 2016.
>>>  http://webisa.webdatacommons.org/
>>> Prof. Dr. Heiko Paulheim
>>> Data and Web Science Group
>>> University of Mannheim
>>> Phone: +49 621 181 2652
>>> B6, 26, Room B1.16
>>> D-68159 Mannheim
>>> Mail: heiko at informatik.uni-mannheim.de
>>> Web: www.heikopaulheim.com
>>> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
>>> Corpora mailing list
>>> Corpora at uib.no
-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/html Size: 8234 bytes Desc: not available URL: <https://mailman.uib.no/public/corpora/attachments/20170518/0a700d77/attachment.txt>