Superficially, the observation isn't deep -- wouldn't seem to be, but the proposed alternatives seem a lot worse. Lets start with a tree represented as an s-expressions. Example:
(S (NP the garage) (VP is (PP next to (NP the house))) .)
There's an "obvious" encoding in XML, which no one actually ever uses:
<S><NP>the garage</NP><VP>is<PP>next to<NP>the house</NP></PP></VP>.</S>
This is slightly less readable, utterly equivalent, and trival to convert to and from. Not a big deal, I would think. Lets look at one proposed format: eGXL. The encoding would look something like
<node id="s8_1" form="the">
<node id="s8_2" form="garage">
<node id="s8_3" form="is">
<node id="s8_6" form="the">
<node id="s8_7" form="house">
<node id="p8_1" form="S">
<node id="p8_2" form="NP"> <!-- will be (NP the garage) -->
<node id="p8_3" form="VP">
<node id="p8_4" form="PP">
<node id="p8_5" form="NP"> <!-- a different NP than the earlier one-->
<edge from="p8_5" to="s8_6"/> <!-- "the" is part of NP -->
<edge from="p8_5" to="s8_7"/> <!-- "house" is part of NP -->
<edge from="p8_4" to "p8_5"/> <!-- (NP the house) is a part of a PP -->
The above is how you represent a tree in eXGL. Yikes!! Extremely verbose, and its utterly opaque; it needs some automated wysiwyg tool to view the thing. Debugging becomes difficult and tedious, instead of something that can be done with a glance.
TigerXML is more or less the same thing, except they're called "terminals" and "non-terminals" instead of nodes and edges.
To me, the s-expr is vastly superior in both size and readability to that offered by eXGL and TigerXML.
Enough of that. ================ Triples. As I mentioned, triples are really really hot right now on the "semantic web", and not at all procrustean. They've been heavily standardized, see for example: http://www.w3.org/TR/rdf-mt/ and are widely used for blog syndication, and are being explored for building ontologies (OWL, the "Web Ontology Language", a follow-on to DAML+OIL, etc) and knowledge-bases, queryable by using SPARQL and a host of other acronyms that'll make your head spin. Triples promise to be the foundation of web-3.0 So -- for example:
(Berlin capital-of Germany)
is a triple one might soon expect to get out of wikipedia. Here's a listing of just some triple databases so far: http://protegewiki.stanford.edu/index.php/Protege_Ontology_Library
In my case, my triples are:
err. maybe this is a really bad example for showing an idiosyncratic treatment of prepositions ... but anyway .. these are two triples. How to represent these as XML? Both eGXL and TigerXML again convert these into opaque spaghetti.
I guess RDF is one candidate worth exploring in greater detail, in particular, the N-triple format:
which is -- lo and behold -- just a plain, very readable listing, which might look like, e.g.
<http://opencog.org/relations/1.0/pobj> next_to house <http://opencog.org/relations/1.0/psubj> next_to garage
and there's even a stunt to shorten "http://opencog.org/relations/1.0/psubj" into something shorter, by defining an alias for it:
@prefix rel: <http://opencog.org/relations/1.0>.
and so: rel:pobj next_to house etc. would be nice and compact.
Anyway, its more or less trivial to convert
into N-triples, and thence to RDF, or back in the other direction. Oh, there's also the "Notation 3" for RDF triples, see
and also "turtle" (Terse RDF Triple Language)
Caution, I am not an RDF expert -- this is the limit of what I know.