FTP Online
 
 

Searching for Holy Grails
Author and expert Jon William Toigo believes that you need to keep striving for the perfect storage solution and an effective disaster recovery plan.

Posted September 15, 2003

Jon William Toigo, CEO and Managing Principal of Toigo Partners International LLC and chairman of The Data Management Institute LLC
Jon William Toigo is a 20-year IT veteran and author of two essential books on disaster recovery (DR) and storage: Disaster Recovery Planning: Preparing for the Unthinkable, currently in its third edition, and the forthcoming book, The Holy Grail of Network Storage Management. He is also CEO and Managing Principal of Toigo Partners International LLC, an independent consultancy and technical research & analysis firm, and chairman of The Data Management Institute LLC, a professional development organization for those who design, plan, manage and administer storage infrastructure and data assets. He has written 13 books and over 1,000 technology articles.

Jon spoke with FTPOnline editors about how you can create a DR plan, how you should deal with new storage challenges, and where to focus your efforts in the future.

How to Begin Planning
FTPOnline: Did the events of September 11th spur greater management support for DR planning, or are many IT managers still trying to get support in this area?

Jon Toigo: There was certainly an uptick in interest in DR planning immediately following 9/11. A recent survey from Imation says that 56 percent of the surveyed companies with DR plans implemented regular testing for the first time. While 43 percent moved data off-site, 42 percent established regular update procedures, 39 percent increased budgets, and 26 percent implemented a formal DR plan for the first time. Of course, the survey also revealed the sad truth that only one in three of the companies surveyed actually had a plan that was tested on a regular basis.

Post-9/11, there was also the continuation of a downward economic trend that militated against spending any money whatsoever on DR in many firms. In lean economic times, it's tough to get senior management to approve expenditures on capabilities that in the best of possible circumstances would never be used.

FTPOnline: If a company has no disaster recovery plan, where should it begin? Are there common elements regardless of company size?

Jon Toigo:: If you have limited resources, you need to spend what you have on three things—data protection, disaster avoidance, and awareness building. The first two have obvious value: fire detection, alarm, and suppression, and similar disaster avoidance technologies, both help to prevent avoidable disasters and enable you to protect your most precious asset, people. Next to trained staff, the second thing you need to make a recovery is data, which is irreplaceable. Without these two assets, personnel and data, it doesn't matter how well you've planned logistics for server, network, or end user computing environment recovery.

Awareness building is the third key area in which effort should be placed. Many n-tier client/server applications and hosting platforms are being built today that defy cost-effective recovery because of faulty design decisions from a disaster recovery perspective. You ask the systems designer why he or she chose this middleware, which depends on hard-coded remote-procedure calls, over that middleware, which dynamically discovers app components and would enable expeditious system recovery without 1-for-1 platform replacement in a disaster, and most of the time he or she will simply respond, "Because nobody told me to." DR needs to become a consideration at every stage of application development and platform architecture. It can no longer be cost-effectively delivered as a bolt-on after systems have been rolled into production.

Challenges, Costs, and Best Practices
FTPOnline: What are some of the main challenges and costs involved in DR planning?

Jon Toigo: The primary challenge is usually mapping the inputs and outputs to business processes and ferreting out the infrastructure components that handle them directly or indirectly. The problem is made worse by the fact that there is rarely an up-to-date description available of the business process itself, let alone of the systems, networks, and storage that enable it.

Secondly, planners confront a major hurdle as they seek to characterize data—to determine what needs to be protected and what doesn't, what needs to be restored immediately versus what can wait awhile. When the computer was originally designed, some brainiac decided data should be self-destructive. It should overwrite itself whenever modified. We need a new mechanism that attaches headers to data that will identify what app was used to create the data and what protection and retention characteristics the data manifests. I describe a solution to this problem in my next Holy Grail book. It could be implemented readily, especially as Microsoft moves data storage away from files and into object-oriented databases.

Finally, most planners confront the challenge of getting sign-off on a purchase order from cash-strapped senior management. You need to recontextualize DR planning so that the strategies you seek to implement do more than simply reduce risk. They must deliver the whole enchilada of business value: risk-reduction, cost-savings, and business enablement. So, look for dual use strategies—for example, network resiliency strategies that have the additional benefit of improving application performance for end users.

FTPOnline: Are there some general "best practices" that can help with DR planning?

Jon Toigo: DR is represented by a lot of "gurus" as a mysterious undertaking whose rules are known only to a few privileged practitioners. In fact, it is a straightforward application of common sense. You need to know technology to do the job. You need to know project management. You need to know how to negotiate with vendors. You need diplomacy, tact, business savvy, and excellent written and oral communications skills. Beyond this, keep yourself educated on the subject by reading books, attending the occasional conference, doing the professional development activities that you would normally do to stay current with technology. I don't believe in DRP certifications, but I do believe in data protection certification (certifying that you know something about the technologies available for replicating data).

FTPOnline: How do distributed computing environments make DR planning more complicated?

Jon Toigo: Distributed computing is a two edged sword, really. Distributed systems may be more survivable if measures are taken deliberately to implement redundancies and networks are fully meshed. After the Kobe earthquake a few years ago, a company with a distributed environment was back up and running within four hours by working around damaged platforms, while a company with a centralized IT infrastructure was down for four weeks.

However, as I mentioned earlier, distributed environments often fall prey to unenlightened designs. Rather than building distributed architectures that can be rebuilt on the fly from different types of servers, networks, and storage devices, designers too often implement architectural designs that require 1-for-1 replacement of all components, a prohibitively expensive strategy.

FTPOnline: How often do companies need to revisit DR plans? Are there effective ways to test them?

Jon Toigo: As often as you can. Certainly on a routine basis every few months, but also after any new technology or application is implemented. Testing can be done through a paper walkthrough or through the actual implementation of strategies at a recovery strategy. There is no one right way. The wrong way is not to test at all.

FTPOnline: In your book, Disaster Recovery Planning, you say that you look forward to a time when DR planning books aren't necessary because DR planning will be integral to all companies at all levels. Is that likely to happen anytime soon, and if not, why?

Jon Toigo: It could happen, especially as organizations begin looking for ways to directly map infrastructure to business process and seek Service Level Agreements from their IT organizations (whether in-house or outsourced). When IT is run like a business, rather than an exception to the rules of profit and loss, it will be forced to deliver services that are supported by resilient architecture. I also think that the next generation storage technologies—and I'm not talking about Fibre Channel fabrics or the storage area networks (SANs) of today—will embrace a utility model. Running storage as a utility will require that provisions be made for the management of data, not just hardware. Achieving capacity allocation efficiency and capacity utilization efficiency carries with it the burden of managing data replication and providing redundant access options. So, what we think of as DR will eventually become part of the design process itself.

Storage Issues and Solutions
FTPOnline: Turning to the topic of storage, you have a new edition of The Holy Grail of Network Storage Management coming out soon. What are some of the new storage issues faced by IT managers? What are ongoing issues that still need to be addressed?

Jon Toigo: Wow. That would require a whole book to answer. The storage issues confronting IT managers come down to two: data protection and data management. We are still interpreting management as the management of devices and components and contextualizing storage as a repository—somewhere that data goes to sleep. We must now acknowledge that storage is a lot more dynamic than the von Neumann machine design envisioned. Data is constantly in motion and we need entirely new layers of management software to deal with dynamism. For example, we need a way to migrate data based on access frequency as well as other criteria that are associated with the data itself, and the application and business process that it supports. I suggest a way to do this in the book and I am sure that there are other approaches. As data files are increasingly treated as objects and as the data storage infrastructure becomes truly networked, all kinds of opportunities exist to optimize capacity so we only need to pay for what we actually use.

On the subject of data protection, current security techniques are still in their infancy. This will become a key issue area as more and more storage is networked. In general terms, the field of alternatives between tape and disk mirroring is expanding exponentially. IT managers confront the double burden of keeping informed about new techniques and sifting through the "marketecture" around the technologies that spews non-stop, like water from a cracked fire hydrant, from the vendor community and its mouthpieces at SNIA and the major research and analysis houses. Getting objective and actionable information about storage technology is becoming a very, very difficult task.

The biggest issue, perhaps, is getting those folks who are tasked with the administration of storage some visibility and credentials. There is no formal job description for a data manager. His job is narrowly conceived in most organizations as a maintenance droid assigned to keep arrays, switches, host bus adapters, and interconnects up and running. Yet, even in this task, he has little formal training. Think about it: we entrust our most critical non-human asset and our most expensive infrastructure component to folks who have absolutely no formal training in how to do the job. This needs to be addressed quickly and the role of data manager needs to be firmly established as a discipline within the IT field.

FTPOnline: One chapter in the new storage book is called "Final Word: Tape is Dead...Maybe." What are some of the new alternatives for data protection that fall between tape backup and disk mirroring?

Jon Toigo: They fall into a spectrum, some aimed at reducing the time required to take a backup, others focused on shortening "time-to-data"—also known as restore time. For a long while, the industry said we only had two options: mirror or backup. Now, we are discovering that alternative strategies do exist. Techniques like disk-to-disk-to-tape can abbreviate the backup process, while the new crop of "way back machines" (that reduce the need for mirror splits to create synchronous point-in-time recovery volumes) can shorten restore speeds while alleviating the expense of old-fashioned mirroring. There is some pretty interesting innovation going on in startup land, much less in established vendor shops.

FTPOnline: Why is it important to tailor storage solutions to the type of data being stored?

Jon Toigo: The simple answer is that there is no one-size-fits-all solution. Going with platforms hyped by vendors to be the last storage you will ever need to buy has created an unsustainable expense model for storage today. We utilize only about 20 to 30 percent of our storage efficiently, and we own three to eight times the storage we actually need. Plus, the storage is poorly instrumented for management, a major cost accelerator. Data characteristics, largely set by application requirements, should dictate the type of disk, interconnect, and platform architecture that are suited to its storage. It is just that straightforward: mom doesn't need a SAN to store recipes on her PC or to surf the Web. The application and its data determine the appropriate storage technology.

FTPOnline: What changes do you think we'll see in storage planning in the next year? The next five years?

Jon Toigo: The storage planning approach needs to become rational. Consumers need to join together to demand real standards to support real interoperability from the "Wild West" industry around storage. I am hopeful that economics will force IT managers to forget brands and to start buying technology based on its measurable benefits. We also need more discipline in the acquisition process and better training for those who manage storage for a living.

FTPOnline: Do you have any other advice about what IT architects and planners should be doing today to move their organizations ahead in both storage and disaster recovery?

Jon Toigo: Just this. You have a job to do in both cases that will never make you the most popular person in the company. Treat it like a sacred trust and don't expect to realize perfection. Both perfect storage and perfect resilience are holy grails.

Sample Chapters
Here are sample chapters from each of the Jon Toigo books mentioned in the interview.

"Data Recovery Planning", Chapter 4 of Disaster Recovery Planning: Preparing for the Unthinkable, 3/e.

"Final Word: Tape is Dead... Maybe", Chapter 9 of The Holy Grail of Network Storage Management.