December 2002 Issue
XML poses some interesting challenges for programmers. This is the first of a series of columns in which I will look at XML's interaction with programming languages.
XML's schema model is not as hardened as are types in a programming language, but in some ways it is richer. Language has nothing even remotely equivalent to mixed content, for example. Mapping XML into program data structures inherently risks losing semantics and even data because any unexpected annotations may be stripped out or the schema may be simply too flexible for the language.
To illustrate, given an incoming XML message x, imagine that the programmer wants to compute the price-earnings ratio:
XML x = getxml("somewhere");
PERatio = x.price/( x.revenues -
Today's programmer has two tools available to parse and manipulate XML files: the Document Object Model (DOM) and Simple API for XML (SAX). Both, as we shall see, are infinitely more painful and infinitely more prolix than the previous code example.
While the DOM can be used to access elements, the language doesn't know how to navigate through the XML's structure or understand its schema and node types. Methods must be used to find elements by name. Instead of the previous simple instruction, now the programmer must write something like:
Tree t = ParseXML("somewhere");
PERatio = number(t.getmember(
"/stock/revenues") - number(
In this example, number converts an XML leaf node into a double. This is not only hideously baroque, it's seriously inefficient. Building up a tree in memory uses up huge amounts of memory, which must then be garbage collectedbad news indeed in a server environment.
Now let's examine how a developer might use SAX to implement the same task. First the developer must set up a Content Handler to parse the XML file and then fetch the result of the expression. This requires a charming piece of Java like:
XMLReader xmlreader = new SAXParser();
ContentHandler contentHandler =
String uri = "test.xml";
InputSource is = new InputSource(
new FileInputStream(new File(uri)));
double result = contentHandler.getPERatio()
Of course, the developer must write the class that implements the ContentHandler as well as the method getResult(), which requires more warm and fuzzy code than will fit on this page (see Listing 1).
Imagine if the object-oriented revolution had been ushered in with such syntax just to access an object. It is as if the only way to interact with objects were to use reflections. The object-oriented revolution would have been stillborn. Instead object-oriented languages took care of this plumbing and typing for the programmer.
In short, the current situation is unacceptable. With the increasing ubiquity of XML both as a way to describe metadata and exchange information between programs and applications, and with the rocketing acceptance of XML Web services, it is becoming increasingly necessary for developers to directly access and manipulate XML documents. It should not require that they be rocket scientists to do so.
In the next issue I'll discuss how work that's brewing in the developer community to address these matters holds extraordinary promise for developers everywhere.
Back to top