On first sight, XML appears to be all about trees. And on second sight, reality seems to confirm this misconception:
- The internal structure of XML parsers is based on trees
- Most custom XML formats follow rigid tree structures
- XML critics introduce alternative notations to describe trees
- Poor usage of XPath is restricted to basic tree traversal
Of course you can use XML to model trees. The title of this post does not intend to question those capabilities. XML is very well capable of capturing structured data.
The point is that most XML does not live up to its full potential. The real potential of XML is in semi-structured data. Modeling only structured data with XML means being stuck in second gear.
What is semi-structured data?
The difference between semi-structured data and structured data is in the degrees of freedom. You can mix your elements in countless variations of compositions. Instead of rearranging your data into a flawed tree format, you can choose an outline that fits the document being modeled.
This may sound abstract and theoretical or it may appear messy and unpredictable. But before you question semi-structured nature... Are you aware that you already know a semi-structured format that is a major standard available on almost any computer anywhere in the world?
The XHTML standard is the best possible example of a semi-structured format. Although browsers parse it into a DOM tree under the hood, its real world representation is a document. When writing XHTML you can think about markup freely. You're not wasting energy thinking how to force your data in an inadequate tree.
To stress the importance of its semi-structured nature, imagine what XHTML would look like if it was designed according to the tree based philosophy like most formats:
- <title>Tree structured XHTML</title>
- <title>First chapter</title>
- <title>Example section</title>
- <span>Formatting would have been</span>
- <title>Second chapter</title>