Enterprise Architect  
 
 

Scaling Over Time: The Version Problem
See how to solve the problems surrounding management of a system's changes over time.
by Alex Krapf

January 19, 2006

A lot of scientific papers and articles have been written about scalability. The focus has almost invariably been on the problem of scaling over resources or usage patterns, such as scaling over a number of processors or scaling over a number of requests. The problem of scaling over time, however, has largely been ignored.

By "scaling over time," I'm referring to managing a system's changes over time. Today, you usually scale over time by using a version control tool. You typically have the ability to revisit a snapshot of your entire codebase at a certain point in time. You can also look at a former version of any element in your codebase, normally at file granularity. Sophisticated version control systems might also allow you to deal with the elements of your system in terms of a change set, essentially combining changes to related elements into one conceptual change.

Version control systems are obviously an important part of your development infrastructure. Their features and capabilities have a large influence on your development process. Regardless of their feature set, they are all based on one unspoken premise: Versioning is a concept external to your code.

In today's development process, the idea of a version is introduced after you have written your code. The different versions of code that you use are essentially labeled snapshots in time. A later version of an element will not contain any information about the earlier version; all such change-related information exists only outside your code, as metadata in the version control system.

If you were to look at the evolution of a type T in a system, you might see something like this:

Release # 1.0 1.1 1.2 2.0 2.1 2.2 2.3 3.0 3.1
Type T T T T' T" T" T" T" T"

Table 1
The evolution of type T over a number of releases

In version 1.0 of this product, I used the original version of type T. It remained unchanged until version 2.0, when I modified the type, and it became type T'. Then I immediately realized that I had to make another change, and it morphed into type T" with version 2.1. I didn't modify the type further throughout the remaining versions of the product.

Please note that in this example I have not made assumptions about the backward compatibility of any of the changes. The change from type T to T' might have been backward-compatible, while the change from type T' to T" might have been incompatible. The important thing to understand is this: Type T might undergo an evolution that is largely independent of the release numbers or version control labels. What's more, type T might not correspond with an element in the version control system. It might be a small part of a file, or on the other hand, it might span several files.

Traditionally, I have used the release number of a product to version an entire set of types, whether or not they had changed. I bundled a snapshot of the system into a deployment unit that might have a version number associated with it, such as a jar file or a shared library. I typically regard releases as black boxes and do not anticipate that implementation types of different product versions will coexist. Consequently, I often use the same, unchanged type name to represent different versions of a type. The assumption is they will never have to coexist in one context because they represent unrelated points in time.

Developers are increasingly running afoul of this core assumption. Yes, you might have discrete releases of software that never have to coexist, but type T might be a public API used by clients that have their own release schedule. It might be a persistent type, instances of which might be written out by version 1.0 of a product and (attempted to be) read by version 3.1. How does your version control system help you with these problems? The answer is: It does not help at all.

The Application Concept
Most IT professionals tend to equate "application" with the product's current version or maybe, if you're the unfortunate maintenance team member, the product's previous version. You do not usually consider way-back versions or not-yet-conceived future versions. You can typically get away with this misconception because customers suffer from the same misconception and run a single version of a product on their systems. They might not be aware of why they are doing this, but it has to do with the difficulties and costs associated with getting rid of the old version and deploying the new one.

Why is a product upgrade so hard? Why can't different versions of a product coexist peacefully? I would posit that it has to do with our broken notion of version management—a problem that's costing IT professionals and customers real money.

The difficulty stems from the fact that you don't treat the sum total of all versions of a product as the application. A new version will essentially be a different product that no longer includes the old version of the product. Any exposed interfaces that have changed could potentially break customers' applications; any changed implementation details that deal with persisted data could potentially break customers' applications. This is virtually no problem if you're dealing with a monolithic, self-contained application, but it becomes an increasingly bigger problem if you're dealing with published APIs, persistent data, third-party libraries, and so on. It is a huge problem when you're looking at service-oriented architectures (SOAs), which are essentially published application interfaces.

Consider applications that require you to store, maintain, and keep accessible some information for several decades. Such applications might, for example, be from problem domains such as pharmaceutical research, corporate governance (Sarbanes-Oxley), or intelligence. These problem domains will not allow you to punt on the issue of scalability over time.

What Can You Do?
It's not that developers and administrators are all incompetent; it's that the current tool set does not support the notion of change very well.

Let's look at a simple example. I'm using Java as an example because of its concise syntax, but this can easily be generalized to any other language or SOAs. I'll start with an interface Foo that publishes one method:

interface Foo
{
	public void	doSomething();
}

Over time, you realize that it would make sense to introduce an additional method in the interface:

interface Foo
{
	public void	doSomething();
	public void	doSomethingElse();
}

A lot of people would say that merely adding a method to an interface is a compatible change. Far from it! Imagine that you have many implementations of the interface. After your change, all implementations of the interface will have to be updated with the additional doSomethingElse() method. If all implementations are not updated, your application will not load anymore. Additionally, some of the existing implementations might not need the new method, but the compiler will force you to add the method implementation anyway.

You might counter that this was the wrong way to version your interface. An interface should never be modified once it has been created; instead, it should be extended through inheritance. So you might propose this design instead:

interface Foo2 extends Foo
{
	public void	doSomethingElse();
}

You certainly solved one problem: You don't have to modify any existing implementations of the interface. Implementations of Foo and Foo2 can coexist peacefully. But what about existing users of the Foo interface? You might have a factory method that originally created an instance of a type implementing Foo:

public Foo	createFoo( String arg1, int arg2 );

Should this function return Foo instances or Foo2 instances in the future, or should it return Foo2 instances through a return type of Foo? Should you introduce another factory method, such as this one?

public Foo2	createFoo2( String arg1, int arg2 );

Will methods that used to take a Foo as an argument now require a Foo2, or not? If a Foo2 is required, should they enforce this through their parameter declaration?

Also, it's silly that you have to use the inheritance mechanism to express a new version of the same concept. It has always bothered me that I have to give up the "perfect" name for a type, simply to version it. But beyond personal preferences, this issue also causes real and expensive problems in current applications. So far, I've just been talking about types in general. It becomes much worse when you consider the domain of application integration. The whole point of integration is the publishing and consuming of interfaces and data that is generated by another entity. How many versions of applications, APIs, and data objects do you think you'll encounter over 20 or 100 years?

The State of the Art
The state of the art is … not very artistic. Mostly, there are programming guidelines, naming policies, extensibility patterns, and best practices, but there is little or no support in the technologies in use today. Some programming languages include the concept of an API version; some architectures include the concept of a service version. Java has a serial version UID that can provide an indication of the compatibility of data or API types, but nothing on the market today offers:

  • Versioning as a first class language feature.
  • Tool support for type versioning.
  • Tool support for detecting and dealing with version problems.

This example is not a proposal for a Java language extension; I simply intended to illustrate what versioning support as a language feature might look like and what it has to offer:

versioned interface Foo:2
{
	public void	doSomething():1-2; 
	public void	doSomethingElse():2;
}

This interface declaration essentially tells us that interface Foo is a versioned interface and that it exists in two versions (Foo:1 and Foo:2). It also tells us that the first method is present in both versions, whereas the second method is only present in version Foo:2. This method is already much nicer than having to use two different types because you keep the declarations together and have an immediate understanding of how the type evolved over time.

Now you can create implementations of both versions of the interface, for example:

public class FooImpl implements Foo:1
{
	public void	doSomething() 
{ 
Util.doSomething();
}
}

public class FooImpl2 extends FooImpl implements Foo:2
{
	public void	doSomethingElse() 
{ 
Util.doSomethingElse();
}
}

Notice that you probably don't want to version everything in the system. Here I chose to version the interface but not the implementing classes. So far, this approach is not much different from just using a naming policy, but imagine that you could overload methods based on the versions of their parameters. You don't have to be explicit in terms of version, but you can choose to be. You could do this to indicate that you will support any version in the implementation of the method:

public class FooUser
{
	public int	calculate( Foo f )
	{
		versionswitch( f )
		{
			case 1:
				return 1;
			case 2:
				return 2;
			default:
				return 2;
		}
	}
}

Or you could overload the method based on the version of the argument:

public class FooUser
{
	public int	calculate( Foo:1 f )
	{
		return 1;
	}

	public int	calculate( Foo:2 f )
	{
		throw new VersionNotSupportedException( f );
	}
}

So what's the big deal here? Imagine that the compiler might perform these steps:

  • Enforce that every declared version of a type is handled either through an overloaded method or through a version switch.
  • Enforce that you either declare version incompatibility or provide a compatibility path between versioned APIs.
  • Enforce that there are conversion operators between different versions of a versioned serializable type.
  • Inform you of all the changes you need to make to your code if you were to version a certain type. All you would need to do is version the type and try to recompile.

Wouldn't it be nice to write this code, try to compile it, and receive a compile-time error because you are not handling the later version of the interface?

public class FooUser
{
	public int	calculate( Foo:1 f )
	{
		return 1;
	}
}

	Error: FooUser.java, line 1: 
	method calculate(Foo) does not 
	support Foo:2

Service Oriented Architectures and Versioning
Please don't be distracted by the futuristic and unlikely extension to the Java language used in my examples. The same problems exist to an even greater extent in SOAs. In a service-oriented architecture, you will publish only interfaces, and you will use only serializable data types. Both categories of items will force you to deal with the versioning problem sooner or later.

It is a sad fact that versioning support is as poor in current SOAs as it is in programming languages, even though you need it even more. In a traditional programming language, you're creating an implementation that is usually tightly coupled internally. You control the types that are used, the versions of the third-party libraries that you bundle, the packaging, the deployment, and so on. In a true SOA, on the other hand, you are going to build applications that are stitched together from services that you might not have developed and that are not under your development or deployment control.

Imagine another group creating a new version of a service that your application is consuming. The new version might offer great new functionality, but it might also be totally or subtly incompatible with your application. The other group might be a good corporate citizen and simply add a new version of the service, but this raises other troublesome questions:

  • The other group might prefer to have the new version have the "perfect" name. Now you will have to change all your applications.
  • How long do they have to keep old service versions around? How can you know for sure that no one needs the old version anymore?
  • How do you inform service consumers of newer versions?
  • Can you afford to keep dozens of different service versions (each potentially backed by serious infrastructure) around forever?
  • How can service consumers tell which versions are "compatible"?
  • What does service version compatibility even mean?
  • Where are the tools that enforce consistency in orchestrated service frameworks?
  • How many different versions of serializable data types named "PatientData" do you plan to support? Concurrently? In one application?

In the past, few applications had to have a lifecycle that spanned decades. Tools are equipped to scale over a few years, maybe up to a decade, but eventually applications accumulate so much "cruftieness" through namespace pollution, unenforceable naming and versioning policies, and so on that a rewrite and the loss of compatibility are taken for granted.

In the future, this is not going to be an option. The government has already proposed or enacted legislation that forces corporations to keep data available and usable forever. You will be in desperate need of formal versioning support in the tools you're using. The current best practices for writing maintainable software will not be good enough, simply because they don't scale over time.

Our technology vendors will have to step up and add versioning support to languages and technology specifications, otherwise developers are doomed to fail before they even get started with the second version of their product.

About the Author
Alexander Krapf is president and cofounder of CodeMesh Inc. Krapf has more than 15 years of experience in software engineering, product development, and project management in the United States and Europe. Krapf has also worked for IBM, Thomson Financial Services, Hitachi, Veeder-Root, and Document Directions Inc.