How to Avoid Disaster While Writing a Disaster Recovery Plan
There may be some data you can't operate without, and plenty of data that's not worth the effort involved to protect it for DR purposes, or could be recreated anyway. Unfortunately, everyone typically thinks their data is priceless. The decision about what data needs to be protected should be made on an enterprise-wide basis, and not left to the IT department.
Your organization needs a disaster recovery (DR) plan. The sun may be shining now, but it won't be forever -- disasters are a matter of when, not if. And if disaster does strike, having made a DR plan in advance may mean the difference between smoothly resuming activities, or going out of business.
On top of that, your insurance company, your bank, your customers or government regulations may require that you have a DR plan. So writing a plan is also a matter of when, not if.
Not the End of the World
You could commission a DR consultant to write your plan -- many companies do. The result is often a doomsday script that would leave Hollywood giddy, calling for the replication of every byte of data that your organization possesses, from the moment it's generated, transmitted to an off-site backup facility in another city, continent or planetary body. Failure to comply, you'll be warned, will risk paralyzing your entire supply chain if anyone so much as sneezes. The consultant will stand ready to implement the plan for a further fee -- a fee that will seem quite reasonable, compared to the mind-boggling cost of the hardware and telecommunications services that you'll need.
Or you can write your own plan, relying on common sense and a few pointers:
- Not all disasters are the same.
- Not all data is equal.
- How up-to-date your data needs to be is a critical factor.
- How long you can afford to be down is not obvious.
As for disaster varieties, if your basement is flooded, you'll need to throw all your resources into a speedy recovery effort. If your city is flooded, paralyzing you as well as your supply chain and customers, any emphasis on immediate recovery is pointless, especially as your staff may have better things to do than come to work.
Likewise, an expensive hot site (a functional site maintained in reserve in another city, waiting for your relocated staff to arrive and turn on the lights) may not be a reasonable alternative, since your staff won't want to leave their distressed families. In such cases it's better to arrange for temps in another city to answer calls, take messages and post bulletins on your Web site.
In less dire situations, you might arrange to borrow space at another business for a skeleton staff, and/or arrange for employees to work from home.
As for that asteroid poised to vaporize your hemisphere, don't waste time on DR planning. There are disasters from which there will be no recovery.
Off-Site Storage Doesn't Solve Everything
A related issue is the assumption that you always need off-site storage for your backups, in case there's a fire. But computer hardware is about as fire-resistant as things get. If there is a fire that's destructive enough to wipe out the IT functions, all other operations will be paralyzed, too, so quick IT recovery may not be an issue.
Badly written code that runs wild, malicious employees, intruders, human error -- these things are far more likely to trigger disasters than anything represented in the exciting footage you see on the evening news. But such dangers are rarely considered in DR plans.
Meanwhile, there may be some data you can't operate without, and plenty of data that's not worth the effort involved to protect it for DR purposes, or could be recreated anyway. Unfortunately, everyone typically thinks their data is priceless. The decision about what data needs to be protected should be made on an enterprise-wide basis, and not left to the IT department.
Identifying What's Really Important
When making the decision, it is typical to identify which IT processes are critical to the operation of the business, and then identify what data those processes rely on. In some cases the deciding factor is the value of the customer that the data identifies.
Having identified what data you need to save, the next question is how up-to-date that data needs to be. This has a huge impact on the cost of your DR efforts. Making backups and saving copies on a weekly basis, for instance, is easy and inexpensive. Sending copies off-site for additional security is also not very expensive. Remember, however, you will have to accept the fact that if there is a disaster you will lose a certain amount of time, as your data will be current only as of the last backup. In some cases, starting over as of last week is acceptable. In other cases, it means going out of business.
But saving current data is exponentially costlier as you get closer and closer to real time. Being current within three minutes, for instance, may cost a hundred times more than being current within 30 minutes.
However, real-time backups may be attractive to some organizations, since the issue of having up-to-the-moment backup is also tied to the issue of recovery time. The way to stay up-to-the-moment, and the fastest way to recover, is to replicate each local server at a remote location, with high speed connections between the two. Everything that gets stored in the local server is also, and at the same time, stored in the remote one. If something happens to the local server, you may be able to switch to the remote server without any perceptible downtime. Since the two servers are in separate locations, you can continue operating even in the face of a local catastrophe.
Time and Money
Obviously, replicating the hardware is expensive, and the communications are not cheap. What is less obvious in advance is that keeping the local and remote servers synchronized will slow down the business function, since the local system will have to confirm that an entry has indeed been replicated on the remote system before moving to the next operation.
For banks or financial institutions, every byte of customer and transaction data collected during every second of operation may be priceless, and they may decide it is worthwhile to spend millions to avoid downtime. If you stand to lose US$5 million in revenue for every hour you are down, spending an extra $1 million per month on IT may seem reasonable if it avoids downtime.
But other organizations typically over-estimate the impact of possible downtime, just like they often over-value their data. For instance, for most businesses it is just not the case that a customer who can't get through to your portal at a particular moment will go to a competitor and be lost forever. Most customers come to your site to deal with you specifically, and if the site is down for an hour once a year, they will simply come back later. Spending vast sums to avoid any possibility of downtime is pointless.
Like the decision concerning what data to save, the issue of how current the backups should be, and how fast recovery should be, needs to be made on an enterprise-wide basis.
Once you've identified likely and recoverable disaster scenarios, identified what data should be saved, and decided what cycle to use in saving, then your DR plan should pretty much begin to write itself.
Just don't stand in front of any asteroids.
Paul Froutan is vice president of research and development at Rackspace Managed Hosting.