XML and Why It Should Die
XML, “eXtensible Markup Language” is long overdue a plague, its end of days. I for one look forward to seeing the demise of it. Maybe you don’t yet know what horrors await within the twisting catacombs of this 20-year-old standard. If so, I do not envy you. Not one bit.
eXtraneous Molesters, Laughing :: A primer
So what is this XML thing anyhow? It’s a markup language. Ever seen raw HTML? Something like this
<element>
<subElement property="value">
<subSubElement>Value String</subSubElement>
</subElement>
</element>
This sort of thing is known as an “Element Tree”, which as you might guess, is a tree of element, sub-elements and so on.
The “original” Element Tree language was “Standard Generalized Markup Language”, or SGML. This was expanded on by a team of 11 people, none of whom ever met, to create a new, better standard. After debating a variety of names, they settled on XML.
The first standard was officially released in 1996 as a draft, and within 2 years had become a recommendation of the World Wide Web Consortium (W3C). At the time of writing, there have been two whole releases, taking us to version 1.1. To put that into context, HTML, which is in the same family of standards, has only been around 3 years longer and is on version 5.1. How on earth XML hasn’t managed to develop faster is beyond me.
So by now you should have some idea of what XML is, time to find out why I hate it with a passion of several thousand suns.
eXtra Mouldy Leftovers :: The Good Bits
Now, let’s be fair, XML does a decent job of being a markup language, similar to HTML.
When used right, it’s also “self-describing”, just looking at the elements would give you a pretty good idea of what’s going on in the document.
Sadly that’s all it does well. I don’t think I’ve ever seen XML be used for Markup either.
eXtravagantly Moist Lepers :: The Rest
Hacked in object-oriented behaviour
Most programming languages these days have OO, or “Object-Oriented” features. All this means is that you can create Objects, which have Attributes; for example, an Object might be a car, an attribute might be “Number of wheels”. This makes perfect sense from a programming point of view, and perhaps even from a markup point of view. Hell, there is limited OO functionality in (X::HT)ML, <car number_of_wheels="5">Car Mk5</car>
.
This wasn’t enough for the Hitler-to-bes at the XML standards organisation. They wanted more. And so this horror became reality.
<element xmlns:namespace="http://SOME_SCHEMA">
<namespace:Object></namespace:Object>
</element>
You know I said XML was self-describing? It was. The addition of external schema make it nearly impossible to know what Object is unless you go read the schema, which may itself refer to other schema, down the rabbit hole until we hit rock bottom at a schema that references itself or some other horror.
Inherent Unreadability
All this hacking in of functionality has left XML a mess. I’m just going to copy some STIX XML over. Look at it.
LOOK AT IT.
<stix:STIX_Package
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:stix="http://stix.mitre.org/stix-1"
xmlns:stixCommon="http://stix.mitre.org/common-1"
xmlns:ttp="http://stix.mitre.org/TTP-1"
xmlns:cybox="http://cybox.mitre.org/cybox-2"
xmlns:AddressObject="http://cybox.mitre.org/objects#AddressObject-2"
xmlns:cyboxVocabs="http://cybox.mitre.org/default_vocabularies-2"
xmlns:stixVocabs="http://stix.mitre.org/default_vocabularies-1"
xmlns:example="http://example.com/"
xsi:schemaLocation="
http://stix.mitre.org/stix-1 http://stix.mitre.org/XMLSchema/core/1.1.1/stix_core.xsd
http://stix.mitre.org/Campaign-1 http://stix.mitre.org/XMLSchema/campaign/1.1.1/campaign.xsd
http://stix.mitre.org/Indicator-2 http://stix.mitre.org/XMLSchema/indicator/2.2/indicator.xsd
http://stix.mitre.org/TTP-2 http://stix.mitre.org/XMLSchema/ttp/1.1.1/ttp.xsd
http://stix.mitre.org/default_vocabularies-1 http://stix.mitre.org/XMLSchema/default_vocabularies/1.1.1.0/stix_default_vocabularies.xsd
http://cybox.mitre.org/objects#AddressObject-2 http://cybox.mitre.org/XMLSchema/objects/Address/2.1/Address_Object.xsd"
id="example:STIXPackage-cc0ca596-70e6-4dac-9bef-603166d17db8"
version="1.1.1"
>
<stix:Observables cybox_major_version="1" cybox_minor_version="1">
<cybox:Observable id="example:observable-c8c32b6e-2ea8-51c4-6446-7f5218072f27">
<cybox:Object id="example:object-d7fcce87-0e98-4537-81bf-1e7ca9ad3734">
<cybox:Properties xsi:type="AddressObject:AddressObjectType" category="ipv4-addr">
<AddressObject:Address_Value>198.51.100.2</AddressObject:Address_Value>
</cybox:Properties>
</cybox:Object>
What even is this any more? What does it do? WHO KNOWS!?
The backend of all
I swear XML is the backend fro just about everything I use daily. Web browser? XML. Maltego? XML. Threat Intel stuff? EX EM ELL.
It’s got so bad that there are extensions to XML to make it programmable and all other manner of things that should never have come to be.
Anything we do these days uses XML somewhere, and that’s just not acceptable.
If we’re going to go anywhere with technology, we at least need standards that humans can read. Otherwise we’ll end up with another COBOL where it’s still in use, but all those that ever knew it have long retired or died.
What are we to use instead?
There are a few standards that I rather like, including buy not limited to
JSON
JavaScript Object Notation
{
"car": {
"wheels" : 4
}
}
This is currently my favourite standard because of the interoperability with python’s dict format.
It can be used to store stupidly complex data, and is MUCH more readable than the abortion of XML ever could be.
YAML
Yet Another Markup Language
car:
wheels: 4
YAML is lovely for config files, it’s really structured and can keep lists, dicts and probably a load of other stuff. I’ve only really used it for config files, and it seems nice. It’s incredibly strict on layout though, any incorrect indentation and you’ll have yourself an invalid document.
Conclusion
XML is what happens when a cripple and a thalidomide baby breed. Kill it before it’s allowed to spawn any more than it already has.