FTP Online
 
 

Build Business Service Metrics
Application architects and operations management can work together to develop self-monitoring distributed applications.
by Brian Connolly

Posted March 15, 2004

Web services, distributed application frameworks, and multitier server architectures have made it possible to develop extraordinarily complex applications. Organizations can use these technologies to offer a rich set of business transactions to their customers. On the client side, an enterprise can use Web services to formally define the services it offers so that they are available to a variety of internal and external client platforms. On the server side, enterprises can act as Web service clients, taking the services offered by external suppliers and combining them with their own internal offerings to add value for their customers.

Development organizations build these applications in a logical manner. They create business objects, transactions, and services within a distributed application development framework. This framework allows the development team to defer the assignment of business objects to physical tiers and machines until deployment time. Operations management takes over at deployment time and attempts to support the application by monitoring the physical plant, responding to events, and planning future capacity.

This logical-physical dichotomy is the fundamental operations management problem. Two different world views are in play: development's view of business objects' logical world and operations' view of physical resources and constraints. Both views are critical to an enterprise's success. The enterprise uses the logical view model to develop external business value. The physical view is equally important, because resource monitoring and capacity planning allow an enterprise to sell this business value more cheaply than its competitors.

It's important to understand this dichotomy (see Figure 1). Enterprises implement applications as a set of interacting business objects. The enterprise sells business transactions to customers, and these transactions usually involve a complex series of object interactions. Within the server, several tiers of application servers might interface with the enterprise's resources and data. An enterprise can also use external services within the transactions it provides to its customers, such as order processing, credit approval, order tracking, and shipping.

Development builds a logical model of the enterprise application and usually releases distinct components of it at deployment time. Note that from the external, service-level agreement perspective, problems are reported to operations management in terms of business transaction issues. A customer makes a complaint: "These transactions take too long in my client application." The complaint is directed to operations management, and the dichotomy between abstract transactions and low-level physical resources becomes a critical obstacle.

Tools Can Help
Tools such as HP OpenView and IBM Tivoli can provide an accurate, in-depth picture of machine utilization, hardware and software faults, and network utilization. However, this is only part of the information that operations management needs to diagnose a problem accurately. Operations management lacks the logical model of the business application, the runtime mapping between external customer transactions, and the chain of physical resources the transactions require, so it can't identify critical bottlenecks from the customer transaction perspective.

Consider possible reasons for a customer complaint about a long-running transaction:

  • The external network might delay the request even before it arrives at the enterprise.
  • The transaction might quickly complete, but congestion in the external network might stall confirmation delivery.
  • It might be delayed because of a hold-up in the request to an external service provider.
  • It might be delayed because of an internal resource bottleneck—but which one?
  • It might be delayed because it is an unusually complex transaction, requiring many resources. In other words, it might not be a "problem" at all, but rather something that falls outside the service-level agreement.

The problem worsens because of the need to monitor service-level agreements that specify transaction loads and response time guarantees against mixes of transaction types. Additionally, operations management must record the transaction stream and forecast future requirements for physical capacity, when it has no clear idea of the relationship between physical resources and business services.

The best that operations management groups can do to monitor service quality from the client's perspective is to use "active recording." In this approach, the test scripts mimic external clients and the response time they experience. The best datacenter management products supply tools to make this as easy as possible, but it is still a custom software development. And whereas this approach helps somewhat, it's limited because it attempts to study a dynamic system using static test scripts and provides no information on the mix of transactions that real clients submit right now.

The Way Forward
Development must address the fundamental problem by combining its logical view of the application with operations' physical view. The solution is for development to embed active monitoring in the deployed applications. The idea is: Record transaction metrics in business objects, upload these metrics by "piggybacking" metrics reports on new requests, and design a metrics reports database that gives operations the customer transaction view it needs to succeed.

Development can embed active metrics recording in its applications by extending its business objects. If development has the foresight to model client and server objects in a few base classes, some relatively minor changes in these base classes will suffice for most of the effort. You can embed active transaction metrics recording in a distributed application by following a few principles.

Follow a Few Principles
Principles of business transaction metrics recording include:

  • Every external or internal transaction client records transaction parameters and response times against its server objects.
  • You can extend application protocols so that requests can carry a transaction metrics payload and responses can carry a transaction identifier.
  • Server objects supply globally unique transaction identifiers for new customer transaction requests.
  • Client applications and internal client objects upload metrics on preceding transactions when subsequent requests are made to server objects.
  • Server objects strip off any metrics payloads received from clients and place them in a low-priority queue for logging.

Here's how the enterprise application could incorporate business metrics recording (see Figure 2; new components appear in blue). Client applications automatically record metrics on each transaction requested by the customer. When a transaction completes, a metrics report is attached to the next requested transaction. On the server gateway, the server strips off these metrics reports and queues them for posting to a metrics database. Within the server, client business objects also record metrics on the server objects they use, which are queued to the metrics database as well.

Add Metrics Payloads
You can extend client/server application protocols to add these optional transaction metrics payloads. A typical payload has this information:

  • A unique external transaction identifier.
  • The start and end times of the transaction from the client's perspective.
  • The start and end times of the transaction from the server's perspective.
  • Application information on the transaction, including the business transaction type and any associated transaction parameters. This allows an after-the-fact classification of transaction complexity and the server resources required.
  • The physical identifier of the machine that initiated the transaction.

Typically, a business transaction involves many steps. A customer client application initiates a transaction. Internally the server decomposes this transaction into more primitive operations, each of which involves an exchange between one internal server client and one server resource. You can extend the client/server payload to record metrics on those dependent operations as well. To do this, pass the external transaction identifier (which uniquely identifies the triggering business transaction) along with internal resources, and augment it with identifiers that uniquely identify each of the internal resource requests involved.

Development and operations must team up on the metrics design, and some additional physical plant is required for the metrics queue and database.

How Operations Benefits
Operations management gains significant benefit from these metrics. Now it has a near realtime view of arriving transactions. Operations management can map transactions to resources to physical machines and identify the critical bottlenecks. The transaction types and parameters allow operations management to distinguish complex transactions, which are expected to take longer, from simple transactions that should be dispatched quickly. Operations management can use the metrics history to examine past transaction rates and compare them with marketing forecasts and past predictions, leading to more accurate forecasts of future demand.

Perhaps most importantly, metrics provide evidence to decide whether a customer problem is attributed to software problems, internal bottlenecks, or delays from external service providers. The uncertainty sometimes results in tension between operations and development staffs. By collaborating to design and implement a general business transaction metrics infrastructure, development and operations can work toward the common goal of better serving their customers.

About the Author
Brian Connolly is an independent consultant and author whose specialty is enterprise transaction systems. He has designed international financial trading systems for the foreign exchange and futures markets. You can reach him at brian@ideajungle.com.