Wednesday, December 3, 2008

Programming in XML Aware Languages

There is a lively debate going on over at xml-dev about whether XML applications should (or can) be programmed entirely in XML-aware languages such as XQuery and XSLT. As is usual in such matters, "performance" is being called on to fight a proxy war for religious ideals. All the usual positions are being trotted out "higher level language can't do lower level things "efficiently" (although in this case, amusingly enough, there are partizans on both sides claiming their favourite is "higher"), and the old "it depends on what your application is doing" chestnut, to say nothing of "if it's Turing complete, it can do anything" crossing swords with "just because it is theoretically possible doesn't make it efficient". All so very tiresome and predictable. We have been here before, boys and girls, many many many times, all the way back to the days of FORTRAN versus assembly (and indeed soft programming versus hard programming before that).

In the interests of full disclosure I should confess that I work for a purveyor of an XQuery implementation, so you might suppose I have a dog in this hunt. I don't really. I have seen the dark side of both extremes. I have seen the "forget the XML: just get me my objects, quick" meme lead to much wasted effort in trying to get a data-binding layer to stop being the bottleneck of the application and the focus of disproportionate amounts of development effort or expense in evaluating, integrating, and cursing third-party tools. I have seen this several times, in fact, and each time it made me crazy with frustration to spend so much effort on something so utterly pointless to the end goal. One project went so far as to pickle XML into binary object structures for storage, with lots of special code to handle cross-platform incompatibilities in that binary structure, which was then rehydrated as pure XML for shipment to the business logic layer of the application. That said, I have also seen the dark side of the "XML is so cool, let's put it everywhere" meme, the god-awful DOM code to deal with what was, at the end of the day, just a handful of numeric parameter settings. Stupidity burns on both sides of this debate. (Sturgeon's law, I think.)

What I find immensely frustrating about such language wars is that the notion that "performance" is a meaningful and useful criteria to use to judge the quality of a programming language goes largely without question. As Pauli famously remarked: That's not even wrong. Performance is a characteristic of a specific implementation of a specific algorithm in the context of specific data. A bad algorithm is a bad algorithm, in whatever language it is expressed. A good algorithm is good in any language. What often gets missed is: A good algorithm applied in the wrong context is also a bad algorithm. (Folks: friends don't let friends waste a good radix sort on ordering three numbers.)

The only tenuous connection "performance" has to "language" is if the language makes it easier to produce a better implementation of an algorithm for the data at hand. Often, "language" here means "readily available, high-quality libraries".

I once saw a very concrete example of this. There was some code written in language X to do some specialized HTML parsing. It was then ported to language Y, which was, in the language ways de jour, supposedly many times slower than language X. Turns out, however, that the ported code ran several times faster. Puzzlement from the local language X partizans ensued. The reason was not too hard to fathom, however: it was just so much easier to write a better algorithm in language Y given the libraries it had on hand that we did. We could have done so in language X, certainly, but no one ever did, because it was too much effort.

Which is to say, it is human efficiency that is at stake most directly here, not computer cycles. Here it all runs into psychology and a great deal of baby-duck imprinting frequently gets in the way: I have always constructed my algorithms in such-and-such a fashion, therefore such-and-such a fashion is the better way to construct algorithms. (The prevalence of such a attitude is example #463 of why software engineering isn't actually an engineering profession. Here is another.)

Programming may be the last bastion for the Sapir-Whorf hypothesis, but with the wonderful realization that we aren't stuck with one native tongue. We get to pick which language suits the task at hand, which gets us back full circle to the debate: some applications are well-suited to being done entirely in a language with XML concerns as a primary organizing principle, and some aren't. Pick the right one for the right task. Or, as my son would say Don't Be Dumb.

No comments: