Welcome | Sign In
TechNewsWorld.com
Enterprise

EXPERT ADVICE
Rethinking Failsafes for Critical Linux Systems

Print Version
E-Mail Article
Reprints
Rethinking Failsafes for Critical Linux Systems

Traditionally, there are two main ways to protect information on Linux servers: backup software and the rsync utility. However, these options sometimes do not address certain critical issues surrounding successful system state and application restoration, the ability to failover to a standby target, and server resynchronization.


The Linux operating system is highly compatible with two hot computing trends: virtualization and cloud computing. Just as the 2001-2002 recession helped usher in Linux as a mainstream solution, virtualization may accelerate Linux usage during and after the current recession. Linux already has a powerful presence in the database and ERP realms. Currently, for every US$3 spent worldwide on Windows-based servers, $1 is spent for Linux-based servers. Most organizations either already have Linux servers with critical information to protect or they could soon.

Today's demanding information environment dictates the careful assessment of how to protect critical Linux servers and their information. It is important that

  1. the latest updates and configuration settings are automatically saved;
  2. the Linux server can failover with little interruption;
  3. there is a low recovery point and time objective, and data is current;
  4. a target server can re-synchronize after hours or days have passed; and
  5. database transactions are replicated over a WAN with write order preservation.

Traditionally, there are two main ways to protect information on Linux servers: backup software and the rsync utility. However, these often do not have good answers to important questions.

Can a Linux System State and Associated Applications Be Successfully Restored?

Yes! The configurations of Linux and server applications are often customized during the installation as well as ongoing maintenance and general troubleshooting. Even servers with very similar functions are often configured differently. A primary goal to protecting a critical Linux server is being able to repair or replace the system and get it back into production quickly.

The best-documented changes can quickly become outdated and often cause errors if not found until the damage has been done. Having a process that will automatically protect the unique configuration information will allow those changes to be applied to a standby or replacement server for rapid recovery.

Can a Linux Server Automatically Failover to a Standby Target for High Availability?

The key objective should be to get a server back to a functional condition where users can be productive. Many think that downtime is simply how long it takes to get data back. However, restoring lost data from backups can take several hours, if not days. Recovering an entire workload -- its operating system, applications and their data -- can be extremely complicated and take much longer if recovering from tape. In most cases, that is unacceptable; a different approach should be considered.

Restoring a system state is only the first step of the recovery, and there are still the requirements of restarting particular processes in the right order, as well as applying changes to local network and global DNS-type services. A process that can recover an entire Linux workload in one single step can greatly reduce RTO and RPO and get the server back into production quickly.

How Old Is the Last Backup?

Synchronizing data to a target in real-time avoids out-of-date information and greatly improves recovery point objectives. Backups and snapshots are typically performed once to a few times per day, so the loss of even an hour of updates can take days to reconstruct, or worse, be lost forever.

Rsync is a utility built into most Linux distributions, but it cannot make changes to the operating system of a running target. Rsync also has no automatic method to failover for availability or, more importantly, failback to recover the original Linux server.

When Should Servers Resynchronize After They Have Been Disconnected?

There are two main times when the ability to resynchronize between a source and target is most important. The first is during the initial protection phase. If a network disconnection occurs, the process to resynchronize must be efficient, and until that has been completed, the source has no effective protection.

The second is after a failover to a target has occurred. During that time, the target -- now acting as the source -- receives the ongoing production changes. When the source becomes available, the data will be out of synch from the target running as production. It is very important that the data be re-synchronized as soon as possible so that a failback can restore the system back to production.

How Are Transactional Database Changes Captured to Preserve Write Order Integrity?

A target at a different location can protect against a primary site outage or disaster. Routing along multiple network paths assures that the WAN itself is not the single point of failure. However, the multiple data transmissions that make up a single transaction can route along different network paths. These may arrive at the target in a very different order than they started.

For highly transactional applications like databases and email, the result can be that a target fails data integrity checks so that it cannot function as the source. This is similar to the need for these applications to be quiesced during backups to prevent "fuzzy backups" from occurring. There must be a mechanism to assure the write order integrity of the transaction components and therefore properly protect a system.

Adopting these steps for protecting critical Linux systems will not only keep the system highly available but also reduce recovery time and point objectives from other traditional methods of backup. Being able to recover an entire system state also helps with the ability to recover a server that may not be the same and reduce the need to install the operating system, applications and associated data in order to bring the system back into production more rapidly.


Brace Rennels is CBCP (certified business continuity professional) at Double-Take Software.


Print Version E-Mail Article Reprints More by Brace Rennels


More by Brace Rennels

When It Comes to Server Migration, You Can Learn a Lot From a Twit
January 18, 2010
Twitter isn't just for meaningless banter and celeb gossip. It can be a wealth of professional insight if you know how to use it right. IT pros, for instance, can pick up a lot of tips about server migration by polling their fellow twitterers, aka "twits." Also, hearing what they have to say about about their own mistakes can teach you a lot about what NOT to do.
Tech Support: 5 Great Geek Gifts
December 21, 2009
As the holidays draw near, take care not to forget the people who put in long, odd hours to keep the wheels of the IT department well-oiled. Here are some gift suggestions for the ones working behind the scenes to keep the network humming and business running.
Steps Businesses Can Take Now to Gird for Hurricane Season
August 11, 2009
Hurricane season is approaching once again, and should one hit, the result could be anything from an extended power outage to heavy physical destruction. Businesses in vulnerable regions should factor in special considerations for bouncing back from hurricane damage when laying out their overall disaster recovery plans.
Don't miss a story -- sign up for our FREE e-mail newsletters and view the latest headlines at a glance.
Tech News Flash [ View Sample ]
E-Commerce Minute [ View Sample ]
ECT News Network Weekly Newsletter [ View Sample ]
Secure Your Online Business
Save 50% with Entrust SSL Certificates
Shortcuts
ECT News Network Information
Reader Services
Corporate
ECT News Network