Engaging in disaster recovery requires a company to think about many things, but despite the long checklists that are created to accomplish DR, there are some areas that frequently get overlooked in business continuity and disaster recovery planning.
For example, those exploring disaster recovery often get singularly focused on protecting their servers and data — and of course, those are extremely important. However, they don’t capture the entire picture. It is imperative to address all areas of concern before a disaster strikes, not during or after the event.
Make sure you’ve thought through your company’s unique answers to the following critical questions to help ensure the survival of your organization in the event of a disaster.
What Should Employees Do?
This question has evolved in the Covid-19 world, but it stilts discussion, as that which will define the post-Covid world changes every day. Depending on the disaster (e.g. fire, flood, hurricane), your company’s office could be out of commission for some time. If your office is not accessible, how should employees go about working?
It’s great that your servers and data failed over successfully, but what good is it if your employees are unable to access them? During a disaster is not the time to start scrambling to find solutions.
Previously I’ve recommended companies have a work-from-home policy and the required infrastructure in place, specifically ones that can handle the scale or easily scale-up to support the entire workforce. Nothing will gray a system administrator’s hair faster than dealing with an entire company trying to connect to an undersized VPN connection.
Another recommendation is to have prearrangements with a property management company about using one of their locations temporarily, or a local hotel to use a conference room space.
But in our Covid-19 world, working from home is the new normal; and undersized VPN tunnels have been (or are) in the process of being right sized because of the massive shift to remote work. So, did this question just answer itself? Can we all just post our favorite celebratory meme? Maybe. Maybe not.
Larger-scale disasters can damage your employees’ homes as much as they damage an office park. Hurricane Sandy knocked out power to significant swaths of New York, New Jersey, and Pennsylvania. For some, those outages lasted for weeks.
So, picture this course of events: the office is closed because of a disaster e.g., a global pandemic. Then, say a hurricane knocks out power to 40 percent of your critical IT staff. Considering we are in the middle of a pandemic, and there is always an upcoming hurricane season, this is not out of the realm of possibility.
In this scenario, can your systems afford to have staff unavailable for several days? Do you have enough skill diversity/redundancy to overcome a temporary staff loss of 40 percent? What if all your DBAs are in that affected group?
If you are a national or global company with employees scattered across geographic regions, perhaps you have skill redundancy across regions. But what if you are a smaller company with only one office? Maybe you can open the office, just for critical staff. With a small workforce, social distancing might be possible.
Maybe I just sent a cold chill down the spine of your corporate risk officer. Perhaps your employees can go to friends’ or relatives’ homes? Maybe the company will just spring for hotel rooms for critical staff so they can work socially distant?
What’s the Failback Plan?
A disaster has hit, and your processes worked. Your data, applications, and servers have failed over successfully! Awesome. Let’s break out the champagne, hit the air horns, blast off the confetti, do all the celebratory things (personally, I’m throwing down the cardboard and busting a windmill to back-spin-combo for the ages).
It’s great that your disaster recovery plan worked, but what are the next steps? Are you permanently staying at your DR site? If not, how are you going to failback your data?
Having a failback plan is just as critical as having a DR plan. What are those processes? For example, if you are using storage replication, do you need to redo the entire setup/seeding process? Can your process pick up where the original storage devices stopped? Or do you need to fully instantiate the storage at the original location?
Some solutions automatically flip the data replication direction as part of the failover. But what if the original site is offline for an extended time? How long can you store data changes before the original location is too far behind? Or, what if you’re using a DR company to host your DR? Are there higher, or additional, charges to run your now-production systems out of their facilities for an extended period?
Maybe you’re in one of those fancy public cloud environments, and it doesn’t matter that you are in a different region/availability zone. Embrace the change and viva la US-West-2!
Failback plans can be messy and expensive, even when thought out and practiced. Practical tests may not be possible, but you can definitely ensure your company thinks about it, talks about it, performs tabletop exercises, and works-up theoretical run books. Being prepared can mean the difference between success and failure.
Is Backup Infrastructure in the DR site?
This question is similar to failback, but somewhat different. Your disaster recovery process worked like a charm. Now you are running out of your DR site. Great. Do you need to worry about backups and restores? Steve on the CRM team just accidentally deleted all of the client data. How are you addressing that? Do you have replica DR infrastructure in your DR site? Great. Make like a choose-your-own-adventure and skip ahead to the next section.
Still here? What is the plan for your backups? Backups and restores still need to happen. Those issues don’t go on hold because of disasters; and if your failback options are particularly involved, you might be working out of your DR site for an extended time. The “Steves” of the world are their own walking disaster.
What are some options? If you are in one of the cloud infrastructures, your backup systems can easily be included in your DR plans, or easily recreated in your new AZ or Region. If it’s an on-premises solution, putting a replication pair in place can be a very nice result. Or having a process in place to utilize the existing backup infrastructure in place at that location if it’s a working site for your company.
Some companies have plans with their IT vendors to quickly procure required infrastructure and implement it asap post-disaster. Why buy it until you absolutely need it? Save your money until required. While an acceptable solution, I am not a fan. It leaves too many variables for my taste. Will there be stock at that time, are will other people be trying to purchase the exact same item(s)? Will there be shipping delays? Are you absolutely positive the datacenter/colo can support it physically or electrically? This is a lot to leave for the last minute.
DRaaS: How Many in the Area Use the Same Provider?
Outsourcing disaster recovery to a third-party company can be a great solution. But are they prepared for a larger regional disaster? How many other companies like yours are they doing business with? How many companies in your region also have DR plans that include using their facility in Phoenix, Atlanta, or Las Vegas?
If something like a hurricane were to pass through the Northeast, would your DR company be able to handle many companies failing over to the same facility? How many other clients does your “dedicated” DR manager have? Will you get the necessary attention and service in your time of need? Or will they be severely distracted by 10 or 15 other clients in the exact same situation?
Yes, I’m laying out a rather extreme situation. But natural disasters are more prevalent than ever. Hurricanes hit the coasts more frequently and the entire West Coast is susceptible to fires. Should you avoid using a third-party provider? Not necessarily. Just be informed, ask questions, and set realistic expectations. If you don’t get answers you like, explore alternative directions, like using a different vendor or one of the cloud providers.
I hope these questions have given you food for thought. Disaster recovery is a massive IT area that comes in many flavors, many of which seem minor until you brush up against them.
Knowing what you’re getting into and having a solid plan in place can make the difference between success and failure — and the continuance of your business.