The Future of Testing Tools

SPIN and the Future of Testing Tools
Computer scientist and researcher Gerard J. Holzmann talks about SPIN, a free software verification tool for multi-threaded software systems that offers a glimpse into sophisticated testing tools to come.

Posted September 29, 2003

Dr. Gerard J. Holzmann

Dr. Gerard J. Holzmann is the author of the SPIN model checking system, recognized as one of the most powerful and widely used verification systems for distributed software systems. His most recent book, The SPIN Model Checker: Primer and Reference Manual, was published by Addison-Wesley Professional in September 2003. SPIN was recognized in 2001 with the ACM Software Systems Award.

Dr. Holzmann is currently the Principal Computing Scientist in the newly formed Laboratory for Reliable Software at NASA's Jet Propulsion Laboratory (JPL). Before joining JPL, he was Director of the Computing Principles Research Department at Bell Laboratories in Murray Hill, New Jersey. Dr. Holzmann has published over 70 technical papers, authored 4 books, and holds 6 U.S. patents.

Dr. Holzmann spoke with FTPOnline editors about how SPIN works, how it can make a dramatic difference in the testing of multi-threaded and distributed software systems, and how it may influence future testing tools.

FTPOnline: What is SPIN?

Gerard Holzmann: SPIN is a software verification tool that can help the user to find and diagnose subtle, concurrency-related bugs in multi-threaded and distributed software systems. The tool is the result of a long-term research project that pretty much started in 1980, when I joined the Unix research group at Bell Labs.

The tool can be used to trap race-conditions, deadlock errors, and errors caused by data corruption that can result when different threads of execution step on each other's data without proper synchronization. It can also be used to prove more sophisticated temporal properties of systems of asynchronous processes.

Curiously, the class of concurrency-related software defects is largely ignored by the classical approaches to software testing, yet it is the one that worries software developers the most. Probably the majority of code that is being written today is designed to execute in a multi-threaded context. Our software development tools have not kept pace with this change: there are virtually no mainstream tools available today that can help a developer to design truly reliable multi-threaded systems software.

Ask any developer and they will admit that trying to fix a concurrency-related bug shortly before a product deadline is one of their worst nightmares. What makes these types of bugs so hard to diagnose is that they are rarely reproducible: a concurrent system behaves like a non-deterministic system, where the precise executions can depend on subtle timings of uncorrelated events. The result is that a damaging bug will manifest itself only intermittently. After a fix has been made for the suspected cause of a race, a developer will almost always have a lingering doubt that the bug will still be there, to strike again and do more damage.

When we first ran into these problems at Bell Labs, we decided that there had to be a better way to deal with these types of software development problems. The result was the SPIN tool. Last year the tool was recognized by the ACM with its prestigious Software Systems award, the same award that this year recognized Java, and earlier had been awarded to systems such as Unix, Tex, TCP/IP, and the World Wide Web.

FTPOnline: How does one use SPIN?

Gerard Holzmann: SPIN works somewhat differently from a regular testing tool. The SPIN user is asked to create a relatively precise description of an algorithm as an executable model. The model can be simulated in various ways, which makes SPIN an excellent rapid-prototyping tool. But the big thing is that each SPIN model can also be verified thoroughly. A SPIN model is typically defined at a slightly higher level than the implementation, to avoid having to include also more trusted, routine parts of an application (like the precise implementation of process scheduling, message passing, or memory management).

Ideally, the level at which this executable model is defined matches the level of a white-board sketch that a developer would use to explain the essence of the working of a system to a peer (and why he believes that it will all work correctly). SPIN can take that description and then thoroughly analyze the behavior that was specified to see if certain types of concurrency-related errors can or cannot occur. It typically takes no more than an hour or so to build a SPIN model, and it usually takes no more than a few seconds for SPIN to show where the bugs are.

Running SPIN for the first time is usually an eye-opener for a new user. SPIN will virtually instantaneously identify possible sequences of behavior that the user never imagined would be possible: precisely the types of behavior that can strike in practice and either greatly annoy the end-customer, or cause inestimable damage by hanging the system or by corrupting data in mysterious ways.

SPIN is powerful enough that it is also possible to define very detailed models of concurrent systems behavior, down to the level of implementation-level code. The analyses can become more compute-intensive if the level of the model is too detailed, though, so there is a benefit in sticking with slightly more general SPIN models. The benefit of a more general model is that the results of an analysis apply immediately to a broader class of possible implementations: the analysis also becomes more generally applicable to a class of possible designs.

FTPOnline: Who should use SPIN?

Gerard Holzmann: One of the more appealing things about the SPIN tool is that it can be used by requirements engineers, system designers, software developers, and by testers. The types of analysis that are supported by this tool can be used before any code has been written, precisely because it can work with higher-level design sketches. A good argument can be made that this is in fact the best place to use the tool.

In practice, though, SPIN is often used after the code for an application has been written, when the classical approaches to software testing are seen to fall short. Since the tool is freely available (from http://spinroot.com) for both commercial and non-commercial use, it requires no great investment, other than a mild learning curve in the use of the tool and the specification language that it supports.

FTPOnline: Can you give examples of applications where SPIN made a major difference?

Gerard Holzmann: There have been many applications of this tool, but a few really stand out. At Lucent Technologies we used SPIN over a period of two years to check that the call-processing software for a new telephone switch was free of concurrency-related bugs. The code was heavily multi-threaded, as most applications are these days. The call processing code was also considered to be the most difficult part of the switch to get right: it really forms the heart of the switch.

We were able to check the code very thoroughly at implementation level (the code was written in C). To do so, we relied on a relatively new feature in SPIN that allows the use of embedded C code within the verification models. Towards the end of the product development cycle for the Lucent switch, we were able to check the call-processing code for compliance with over 200 logical and functional properties in about 40 minutes of CPU time.

During the lifetime of this product, not a single software defect was reported in this part of the code: a highly unusual phenomenon, and a testament to the effectiveness of the SPIN-based approach to testing. Today, SPIN tends to be used mostly for critical software development, and more sporadically in routine software development. At NASA/JPL, for instance, SPIN is used to check that the critical components of flight software for space missions are bug-free.

We estimate that there are between 5,000 and 10,000 people who routinely use the SPIN tool today. The potential number of users for SPIN runs in the hundreds of thousands, though, so we have more work to do.

FTPOnline: Where can we learn more?

Gerard Holzmann: Since the ACM award, SPIN received a lot of attention and has attracted significantly more users. This has led to the publication of a new book that gives a comprehensive overview of how the tool works, and how it can be used. The book, called The SPIN Model Checker: Primer and Reference Manual, was published by Addison-Wesley.

A good place to find out more about the tool and its applications is also the annual SPIN workshop, which has been held since 1995. It is supported by ACM SIGSOFT.

FTPOnline: You wrote an article as part of FTP's "Future of Software" report in late 2000. In it, you predicted embedded smart-source code analyzers warning developers immediately of subtle bugs as they get introduced. How close is SPIN Version 4 to that prediction?

Gerard Holzmann: This is indeed my long-term goal, to turn this prediction into reality. SPIN is known today as one of the most efficient and powerful checkers for multi-threaded software systems. To achieve this level of efficiency, the tool employs search algorithms that are founded on many decades of research in automata theory, logic, and graph theory. Every few years we see another major leap forward when a new type of algorithm is discovered that can reduce runtime and memory use of the tool still further.

It is always dangerous to extrapolate, but if we can continue this trend, it should be possible to deliver on this promise. Within about 5 to 10 years we should have been able to refine this technology to the point where it can be made invisible, and can be "hidden" in a standard language compiler.