XML & Web Services Magazine - End Tag: Setting the Stage

FTPOnline

Channels

Conferences

Resources

Hot Topics

Partner Sites

Magazines

About FTP

	email article
	printer friendly

Setting the Stage

February 2003 Issue

Last issue, I began a series of articles discussing the problems that programmers are having manipulating and accessing XML documents given the lack of intrinsic XML support in programming languages ("Speaking XML," December 2002/January 2003). Since some readers thought the last article jumped into the subject too rapidly, this time I will better set the stage by distinguishing between two quite different scenarios I see in customer sites today.

Adam Bosworth
Vice President, Engineering
BEA Systems Inc.

In endpoint processing, the program is an end point for the XML. Its job is either to extract from the XML the information it needs to do its job or else to create an XML document from scratch. Typically, the program must do complex procedural things with the document that require code. The program can essentially be "lossy" with respect to anything in the incoming document that it doesn't need to know; anything else residing in the XML document can be discarded.

In waypoint processing, the program acts on an XML document as it flows through a process such as a stock trade, purchasing request, or new-hire document. In this case the program shouldn't be "lossy," since it doesn't know what functions precede or follow it in the business process or what annotations they may have made up the line or expect down the line. In these cases, simple transformation from one schema into another may be all that is required, and a suitable declarative transformation language may make code unnecessary. This will be increasingly common as the role of message brokers, Web service brokers, and integrative applications explodes over the next few years.

Both kinds of processing share unavoidable problems dealing with changes. If the application logic cannot change independently from the document grammar (I call this being loosely coupled), then the system is fragile. It requires an explicit model for mapping between the document and the code. If the mapping itself is code and that code is hard to read, then the system is hard to maintain and modify in the face of change. Further, both systems may need to make decisions based on the content of the message that can greatly modify what they extract from the message and how they do it.

There are essentially three different technologies in use today that address the problem. The first, mapping between the XML and objects using instances of classes already in your application, has several variations. You can load the incoming XML into a DOM and then use the DOM API to access the data and map into the object. For outgoing XML, write code to build strings or again manipulate the DOM, or better yet, the JDOM. Processing XML with Java, by Elliotte Rusty Harold, treats this subject really well and should be read by anyone processing XML in Java today. Alternatively, you can stream incoming XML through a tokenized parser such as SAX and fill in the objects. Or you can define a declarative language that is used to describe at a higher level how to do some combination of the previous techniques.

A second approach is to load incoming XML into an automatically generated object whose class definition is inferred from the incoming document's XML schema (or vice versa on outgoing responses). Then write code to pull items out of this object and put them into the objects already in use in the receiving application. JAXB is an example of this model and SOX can be used to do this given a suitable SOX reader.

Finally, you can use a different language to process the XML than the one used to write the application itself. XSLT is the one used most commonly today.

It is the thesis of this series of articles that these technologies are only rarely suitable for either waypoint or endpoint processing. As I wrote in the prior article, they place an intolerable burden on the developer. Two exciting new technologies, extensions to JavaScript for native processing of XML (coming up through the aegis of ECMAScript) and XML Query (from the W3C), can vastly improve performance and productivity for this sort of programming.

In the next article, I will introduce typical examples for endpoint and waypoint processing. Then, if space allows, I will introduce XML Query and discuss why I believe it is a far better solution than XSLT for mapping and translation between XML and Java and between XML and XML.